Workly

Site Reliability Engineer

At PostHog

between GMT-8 (Pacific time) and GMT+2 (e.g. Eastern European time)

We are looking for a Site Reliability Engineer to join our Infrastructure and Deployments team.

PostHog is on a mission to increase the number of successful products in the world. Right now we are focusing on helping product creators understand how people are using their product. Where things go well. Where things don't go so well.

We have built a great service for understanding how users use products. Now we need to scale it and make sure it is rock solid. You will be joining a rapidly growing team helping us support the services that allow us to absorb billions of events without loss even during crazy peaks. You will also be helping out with keeping our query engine up during high loads.

We also need to build out the infrastructure needed to deploy this stack on customer’s VPCs and equivalents so that they can maintain control over their data without any loss of quality of experience on PostHog.

PostHog is a well funded Y-Combinator and GV backed startup from the W20 batch and is growing rapidly. We love open source, data, and crafting beautiful products that are easy to use and provide clear value.

If you are interested in software infrastructure on Kubernetes, excited to work with an experienced software development team, this job is for you.

Please note - for this role we are looking for candidates between GMT-8 (Pacific time) and GMT+2 (e.g. Eastern European time).

Here is what you will be working on:

  • Building AWS, GCP, Azure (and more!) deployment automation for delivering PostHog to ourselves and customers
  • Troubleshooting networking, compute, and Kubernetes failures.
  • Improving performance and robustness of cloud and customer deployments.
  • Hardening security all around.
  • Working on large scale OLAP databases with huge datasets (we use ClickHouse and love it)
  • Automate scaling ClickHouse for ourselves and our clients
  • Improving how we manage and scale Kafka

Requirements

What you'll bring:

  • Strong experience in Linux systems administration, networks, performance troubleshooting
  • Kubernetes is super comfortable with you. You might even consider yourself an expert.
  • Running and troubleshooting Kubernetes clusters, containerized networking and applications
  • Experience building and maintaining Kubernetes Operators
  • Comfortable engineering in Golang, Rust, or Python
  • Infrastructure and application security engineering experience is a plus.
  • ClickHouse experience is a plus
  • You know the difference between GKE, EKS, DO K8s, and vanilla K8s and how to operate each
  • Even better, you know how to operate a high IO database on each of those ^

What to expect once you apply:

  • You will join a 30 minute intro call to walk you through culture, compensation, the interview process, and requirements.
  • We will send you a 30 minute SRE quiz
  • Technical interview with the hiring team. This is usually 2 PostHog team members spending 45-60 minutes in conversation
  • The final stage is a mix of panel interviews (coding skills and infrastructure), as well as meeting the team

Benefits

What we offer in return:

Sold? Apply now and tell us:

  • How you can achieve the above in a few sentences
  • Why you're drawn to us
  • Your resumé and/or LinkedIn

Not sold? Learn more first

We believe people from diverse backgrounds, with different identities and experiences, make our product and our company better. No matter your background, we'd love to hear from you! Also, if you have a disability, please let us know if there's any way we can make the interview process better for you; we're happy to accommodate!

Apply for the Job

Recent Job Postings