[ad_1]
nAs a Site Reliability Engineer (SRE) at Upbound, you’ll be a vital part of the production services the company is building its business on. You’ll be applying engineering principles to design and build highly reliable and scaled infrastructure and services, deployment pipelines and processes to frequently and safely release updates, and monitoring and alerting systems to ensure it all stays healthy.nnIn this role, you will be…nnn* Taking ownership of the health and reliability of the live production service and infrastructure, ensuring that SLOs/SLAs are consistently metnn* Designing, building, and automating critical portions of the Upbound Cloud service infrastructurenn* Troubleshooting and problem-solving effectively to remediate infrastructure related issues that affect service healthnn* Reporting and fixing bugs in private and public projects.nn* Providing routine maintenance and support of Kubernetes based infrastructure, including extending Kubernetes API and functionality via CRD/Controller applicationsnn* Entrusted to make technology decisions for the business, procuring the right technology and designing and implementing a self-service solution for the teams that consume Upbound infrastructurenn* Collaborating with the development teams to assess and recommend technologies that support company organizational needsnn* Balancing tradeoffs between enterprise and open source technologies to better serve Upboundnn* Supporting the full project lifecycle – discovery, analysis, architecture, design, documentation, building, migration, automation, and production-readinessnnn
See more jobs at Upbound
[ad_2]
Source link