Site Reliability Engineer

Remote
Full Time
R&D
Mid Level

We’re looking for highly motivated, passionate site reliability engineers to join our growing team. At evertz.io, our teams are building services that are used by the biggest names in the exciting broadcast and media industry. Our services are hosted in AWS, with a Serverless First mindset.

As part of this role you will work with our talented teams to help harden our multi-tenant SaaS platform. Using best in class observability tooling, you will be working to debug incidents, while also identifying and implementing improvements to the platform to ensure its continued reliability. Your drive to eliminate toil will see you automating processes and building the tools to do so.

We offer flexible working hours, great benefits, and the freedom to experiment with new technologies and tools to build better products.

Skills and experience you will bring:

  • 2 years of experience managing critical production infrastructure and maintaining reliability and uptime of applications
  • 2 years of experience with monitoring, log-aggregation, and observability services like Datadog, CloudWatch, Honeycomb, Splunk, and New Relic.
  • 2 years of experience implementing and managing production CI/CD pipelines using modern deployment mechanisms such as blue/green deployment
  • 2 years of experience translating SLO’s and SLI’s into actionable improvements. Reliability, monitoring, and observability are not just words to you.
  • Solid foundation in Linux systems administration, networking, and security. 

Additional skills and experience that will be useful:

  • Experience with serverless applications running in the cloud.
  • Experience with security frameworks such as OWASP, ISO, CSA and PCI. 
  • Experience conducting threat assessments and creating remediation plans based on the results of threat assessments. 
  • Experience with penetration testing, threat modelling, open-source, and commercial security tools. 
  • Experience developing new deployment mechanisms for webapp infrastructure, such as: canary, A/B, blue/green, red-line and other deployment patterns 
  • Deep knowledge of performance tuning of core AWS services like Lambda, DynamoDB, APIGateway, SQS, EventBus, EC2 
  • Experience with chaos engineering that pushes systems and products to their limits to see how they will respond to unexpected events. 


About the Role

The evertz.io Engineering Team builds next-generation systems for content management and distribution in the Media and Entertainment industry. Disney, NBCUniversal, Discovery, BBC, and many other content producers and publishers use our products and services to make the most of their file-based and live content for the least effort.

We work with high quality video in real-time and non-real-time scenarios across a wide range of cutting-edge tech. Specializations within the group span from low-level video manipulation and analysis, through back-end management and orchestration services, to web delivered UIs. Working in parallel with these teams is the Scientific Computing Group who work in computer vision, data science and machine learning, taking experiments in Jupyter notebooks through to deployment in production. This makes for a challenging and rewarding engineering experience of continual learning and plenty of opportunity to explore different parts of the stack.

Our technology stack includes a Serverless microservice architecture that capitalizes on the full breadth of AWS services with code written in Python, Rust and Java, our UI uses the latest versions of Angular, Typescript and NgRx, our CI/CD pipelines leverage AWS, Jenkins, Nexus, and Bazel in addition to our in-house release-management application to build and release 100's of software components.

As a Site Reliability Engineer, you will join our talented and passionate team building evertz.io: a collection of services that will be used by the biggest names in the exciting broadcast and media industry. Our services are hosted in AWS, with a Serverless First mindset.

“Work is a thing you do, not a place you go”

We work in agile, low-bureaucracy, high-creativity, cross-functional teams spread across the world. It’s a highly creative work environment where we support your growth with opportunities for career progression, mentoring others and third-party education. The team is built on trust and is relaxed, open and welcoming to all, and there’s fun to be had with regular social events and sports teams.


As part of this role, you will be expected to:

  • Use various monitoring, log-aggregation, and observability services like AWS CloudWatch and Honeycomb to troubleshoot and resolve issues rapidly
  • Implement and maintain CI/CD pipelines on AWS using CodeCommit, CodePipeline and CodeDeploy
  • Foster a culture of reliability best practices across the evertz.io teams through the use of SLIs and SLOs and implementing changes directly in codebase
  • Establish and measure reliability goals such as uptime, downtime, mean time between failures, mean time to resolution, etc.
  • Conducting and documenting root cause analysis’ (RCA) and post-incident reviews
  • Participate in an on-call rotation

Location
This role allows you to work with “Full Flexibility” - for any work where being physically close to fixed equipment is not a requirement, you have the option to work remotely.
Remote working is not the same as working from home, WFH is just one very common option. You can work from wherever gets the creative juices flowing: coffee shops, co-working places, the park, a different country even! Anywhere with Internet access.
Of course, working from an office is an option too especially if you’re craving some ad hoc in-person interaction! Evertz has offices in Canada, England, Scotland, India, Singapore, Hong Kong, Virginia, California, Arizona, Ohio, Hungary, Belgium, Poland and Australia. Many have great spaces for meet-ups as well as permanent or floating desk space.

Working Hours
This role allows you to work asynchronously meaning you can contribute at the times when you do your best work. Some people are early-birds, some are night-owls, maybe Saturday is better than Wednesday? Whilst some overlap for core meetings is needed, you don’t have to do your deep work between 9 and 5.

Salary & Benefits
We offer a competitive salary with annual performance-based bonus and stock option schemes. A pension plan; an employer funded health and medical plan; life insurance plan; long term disability coverage; paid time off; an employee assistance program; and a discount platform. The availability and specifics of these benefits vary by location, details of which will be provided during the hiring process.
 

#zip

When you apply to a job on this site, the personal data contained in your application will be collected by Evertz Microsystems Ltd (“Controller”), which is located at 5292 John Lucas Drive, Burlington, Ontario, Canada and can be contacted by emailing [email protected]. Controller’s data protection officer is Nadiera Toolsieram, who can be contacted at [email protected]. Your personal data will be processed for the purposes of managing Controller’s and its' subsidiaries' and affiliates' recruitment related activities, which include setting up and conducting interviews and tests for applicants, evaluating and assessing the results thereto, and as is otherwise needed in the recruitment and hiring processes. Such processing is legally permissible under Art. 6(1)(f) of Regulation (EU) 2016/679 (General Data Protection Regulation) as necessary for the purposes of the legitimate interests pursued by the Controller, which are the solicitation, evaluation, and selection of applicants for employment.

A complete privacy policy can be found at https://evertz.com/contact/privacy/

Your personal data will be retained by Controller as long as Controller determines it is necessary to evaluate your application for employment. Under the GDPR, you have the right to request access to your personal data, to request that your personal data be rectified or erased, and to request that processing of your personal data be restricted. You also have to right to data portability. In addition, you may lodge a complaint with an EU supervisory authority.

Share

Apply for this position

Required*
Apply with Indeed
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*