Senior DevOps Engineer
About the position
Fabric makes profitable on-demand e-commerce a reality. Its flexible micro-fulfillment solution was specifically designed to enable fast fulfillment from small spaces. By leveraging robotic automation, Fabric allows retailers to reduce costs and cut fulfillment times. Unlike any other micro-fulfillment solution, Fabric’s software-led robotics and modular approach gives retailers the flexibility to build the fulfillment center that fits their requirements, allowing them to fulfill online orders at maximum speed while ensuring profitability. Retailers can choose a platform model to run and operate independently on their real estate or a service model in which fulfillment is offered as a service, with minimal capex investment. Founded in 2015, Fabric has raised $138 million to date and is backed by Aleph, Corner Ventures, Canada Pension Plan Investment Board (CPPIB), Innovation Endeavors, La Maison, Playground Ventures, and Temasek. With offices in New York City and Tel Aviv, Fabric is rapidly expanding its U.S. operations with over 170+ team members globally and 15 sites under development/contract, including two live micro-fulfillment centers.
Senior Site Reliability Engineer
We are looking for a technology leader with multidisciplinary technological skills across engineering, infrastructure and methodologies to join our Software Engineering team in Tel Aviv.
Fabric Software Engineering is a team of brilliant and talented engineers, algorithms developers, data scientists and researchers, all dedicated to building a robust, highly scalable robotic fulfillment network. We're responsible for everything from orchestrating the robots' movement to handling orders and managing stock across multiple fulfillment centers. We solve challenging problems through collaboration across different roles and paradigms using cutting edge technology. We have a diverse tech stack, with a backend organized in a microservices architecture that helps keep it flexible. We're investing in a DevOps culture, which means end-to-end ownership from supporting the feature design to monitoring the production deployment.
As our business grows, the team is developing fast as well, opening new opportunities for both professional and personal growth.
As Senior Site Reliability Engineer (SRE), you will design and implement solutions for the company’s production infrastructure, CI & CD pipelines and observability for a business-critical production environment. You will write code, leverage managed solutions, open-source tools and industry best practices to ensure that our production environment is highly available to serve our customers.
- Design and implement solutions for CI/CD, including simulating new code to test potential improvements or issues
- Ensure a highly reliable system across both cloud and on-prem deployments
- Manage our Kubernetes cluster
- Provide solutions for observability on our production deployment (including monitoring, logging) to other engineers
- Support engineering teams performing development on IOT hardware components
- At least 3 years of SW engineering experience
- Good software engineering skills with experience in running solutions in production; Hands-on approach
- Experience with Docker and with running Kubernetes in production environments
- Experience with modern (containerized) CI/CD solutions
- Experience with Python is an advantage
- Experience with Terraform is an advantage
- An excellent collaborator with great communication skills, who can take part in diverse cross-functional team
The perfect candidate is someone who:
- Is thrilled when they solve complex engineering and architecture problems
- Enjoys solving infrastructure problems, like scaling our prometheus deployment to support increasing workloads
- Is driven by the desire to make a big impact
- Enjoys an enabling role, supporting the team’s velocity and practices