At WorkFusion, we build software that is changing the world and transforming workplaces. Our technology automates repetitive, data-intensive work so people can be freed from the mundane to pursue the meaningful, companies can grow further, and customers can be served faster and better.
WorkFusion is increasingly recognized as the world leader in industry-specific process automation, offering AI-powered software with particular focus on the needs of banking, financial services, and insurance enterprises. Our Intelligent Automation Cloud combines RPA, machine learning and analytics in one unrivaled platform that can be deployed quickly and scale without limit. We compete in the world’s fastest-growing software segment and we are growing at record pace with customers spanning the globe.
As the health and safety of our teams is always a primary concern, we are currently a primarily remote workforce worldwide. Officially, our headquarters is in New York City (on Wall Street) with additional hubs in Canada, Europe and Asia.
The ideal candidate is self-driven, data-driven, and has the ability to work in a distributed team. This professional holds a strong knowledge of Site Reliability Engineering and DevOps methodologies related to Delivery solutions & Platform Automation. In this role, you will be leading the Site Reliability team, sharing your experience in the field with our Delivery, Support, Product Engineering, and Infrastructure teams. You will simultaneously focus on technical excellence and ensuring the team quickly delivers value to customers who have deployed our software in production. The person who fills this role is a subject matter expert who excels in collaboration, open communication, and reaching across functional borders.
● Manage a team of SRE resources, providing guidance and mentorship, technical leadership, and be a primary point of contact for coordination with Support and Product organizations
● Provide support for Site Reliability / DevOps driven solutions for cloud and on-premise environments, troubleshoot issues with applications and middleware components
● Take learnings from the field and take ownership of making sure the improvements/fixes/learnings make it back into the product and to WorkFusion documentation
● Work on Ansible-based product installers and automation scripts
● Build and support Monitoring Systems around the product, as well as highly available and scalable services
● Troubleshoot MySQL and Postgres DBs and the physical and virtual resources on which they run for optimal performance
● Architect and implement increasingly better HA, DR, and backup solutions
● Creates the vision and improves the whole lifecycle of services - from inception and design, through deployment, operation, and refinement. This includes researching gaps in automation and laying out the plan to remove the gaps.
● Recommends and implements strategies, policies, and procedures by evaluating organization outcomes; identifying problems; evaluating trends; and anticipating requirements.
● Show ownership of customer success with WorkFusion platform management.
● Partner with Delivery, Engineering, and Product to steer SRE alignment and strategy to ensure the reliability of WorkFusion platform deployments
● Respond to client reliability concerns and agile problem resolution.
● Lead resource management and efficiency strategy. Provide technical leadership development and recruitment inside the organization.
● Collaborate with our growing DevOps/Infra team to build and iterate on our infrastructure to improve reliability and performance
● Identify parts of the system that do not scale, provide immediate palliative measures, and drives long-term resolution of these incidents.
● Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.
● Advanced knowledge of the Platform features and functionalities
● Provide L3 escalation support to provide expert and minimize the business outage.
● Document solutions and techniques for resolving issues, ensuring information is available to the team through technical notes and the internal knowledge base
● Strong expertise (7-10 years) in administration and engineering of Linux and Windows OS (Amazon Linux, RedHat, Centos, Windows 2016)
● Hands-on experience (>3 years) working with Tomcat or other Java servlet containers
● Practical knowledge of administering and tuning web servers (Nginx), application servers, and databases (MySQL, PostgreSQL, MongoDB/MSSQL)
● Proficiency in Bash (> 3 years)
● Familiarity with Windows Systems and its Services (Microsoft SQL a huge plus)
● Strong Knowledge of AWS, Azure, or GPC cloud services - EC2, ASG, LB, KMS, S3, Route53, Azure LB
● Solid experience (>2 years) in Ansible CM, or similar
● Deep understanding of CI/CD tools (Jenkins/Sonar/Nexus)
● Secret Management Software (Hashicorp Vault), RabbitMQ, Marathon, Mesos
● VMware (Virtualization/Hypervisor)
● Advanced Storage Knowledge
● Scripting languages (Bash, PowerShell)
● System Analysis
● Monitoring and alerting experience (ELK)
● Databases (MSSQL, MySQL, PostgreSQL/MSSQL)
● Network administration, DNS, TCP/IP, Security, PKI Certificate management
Would be a plus
● Familiarity with ELK stack; Grafana/Splunk
● Practical knowledge of Hashicorp Vault
● Experience with Java development
● Deep understanding of Linux kernel, networking
● Smartest people in the industry and the most interesting product in the global market
● Competitive salary
● Indefinite employment contract employment form
● Opportunity to work remotely
● Comprehensive social benefits package, including:
o Health insurance covering all the best med centers for you and your family
o Sport expenses compensation
o Psychologist services compensation
o Professional and English trainings
o Team activities