This week
  • O'Reilly Auto Parts
    Preferred: (GMT-06:00) Central Time

    Have you ever heard of O, O, O, O'Reilly Auto Parts…Ow?! This is not your standard System Engineer position and we are not your standard brand! We are the dominant auto parts retailer in all our market areas.

    Our infrastructure teams work on projects adding directly to the O’Reilly Auto Parts bottom line and we are looking for exceptional Engineers and Admins to help us succeed! Some of the tools we use to implement our projects are Linux, Puppet, Git, Jenkins, Ansible, and other open source tools and technologies. We also utilize collaboration tools such as Jira and Confluence.

    What we look for in our Team Members:

    • Love solving complex problems related to serving our customers better – both internal & external customers
    • Enjoy working with teams
    • Senior level experience with linux and automation
    • Experience with documentation
    • An ambition to always learn and grow

    About our team:

    • We are a “work family”! We have fun together and support each other
    • We respect a healthy work-life balance
    • We are responsible for maintaining our linux infrastructure which consists of over 2,000 servers
    • The team keeps open communication through different outlets – video conferencing, team messaging applications, and daily stand-up meetings
    • Our managers really value collaboration between team members and encourage them to bring forth creative problem-solving ideas from both a technical and functional aspect

    Growth within our teams at O’Reilly Auto Parts:

    • We have several career paths, whether you want to be a supervisor, manager, or architect – there’s a documented growth plan to help you follow the path you choose
    • We want to grow our people – we help to make you better by providing training for both technical and professional development
    • We look to promote from within – O’Reilly is diligent to promote from within our organization with qualified team members
This month
  • Platform.sh

    This role is open to remote full time people.

    Platform.sh is a groundbreaking hosting and development tool for web applications. We’re a European VC-Backed startup with a host of blue-chip Enterprise clients and a string of awards and grants (including €2m from the EU Horizon 2020 program).

    To reinforce our technical prowess, we are looking to grow our operations team. If you’re looking for an exciting, high-growth opportunity with an award-winning, cutting-edge company, this could be just the job for you

    For its PaaS solution https://platform.sh is looking for an Operations and Service Reliability Engineer with a taste for Python and Go, great Linux system understanding, and a real hunger for the challenges of building robust, distributed systems.

    Platform.sh is a PaaS shrouded in a lot of black magic (we can consistently clone a whole running cluster, with its state, databases, indexes in a matter of seconds). We want to get this down to the hundreds of milliseconds domain. Interested? There is more…

    We can consistently generate from the same manifest a Docker container, an LXC one, or VM disk images (AWS, Azure, OpenStack), we want more targets.

    We probably have the highest industry container density. We need to get it higher.

    We support any Python, Ruby, NodeJS or PHP, Java and .NET.

    Directly reporting to our Director of Infrastructure and in close interaction with our Engineering and Customer Support teams, you will be responsible for:

    • cloud operations: configure clusters, deploy stuff, follow-up on alerts, help customer support debug issues, all in Microsoft Azure
    • automating all of the above so they can instead drink margaritas (or non-alcoholic beverages, of course)
    • creating systems, tools & processes that will enhance our support and operations efficiency
    • improving service quality, discipline and reliability throughout lifecycle
    • monitoring operating objectives, streamline and automate intervention
    • continuous learning from Operations experience, modeled as software

    The ideal candidate:

    • has proven successful experience in an operations role
    • has demonstrated the ability to successfully manage cloud-based infrastructure for a fast growing organization
    • has experience with containerization technologies
    • has had exposure to cloud services (Azure)
    • understands how an OS works, knows networking, how git works, and the constraints of a distributed system
    • Puppet experience
    • is proficient in Python (Golang a plus)

    Nice to have :

    • knowledge of Magento Ecommerce, Symfony, Drupal, eZ Platform, or Typo3
    • relational database skills

    Note: We don't like stress, so we build everything to be robust and resilient, but stuff does break. This is a role with on-call duties. If page-duty fills you with dread… well, this might not be a fit.

  • Wikimedia Foundation, Inc.

    Location: San Francisco, CA or Remote

    Summary

    We are looking for a Site Reliability Engineer to directly support our application platform serving the world’s favorite encyclopædia to millions of people around the globe. Wikipedia and its sister projects are powered strictly by Free and Open Source software with MediaWiki in its core surrounded by an ecosystem of microservices in PHP, NodeJS, Python, Go and Java.

    We are a distributed and diverse team of engineers with a drive to explore, experiment and embrace new technologies. During the past few years we have been transitioning our platform from a monolith to a hybrid, microservices architecture, and started migrating our microservices onto Kubernetes. We’ve adopted Elastic Stack and Prometheus as our de facto logging and monitoring platforms and are improving our automation (we ❤️ automation).

    If you find what we do interesting, if you are up to the challenge of improving the reliability and delivery of one of the Internet’s top 10 websites, and you enjoy the idea of working with a globally distributed team, you might be just the person we need. Come as you are!

    Responsibilities

    • Ensure smooth and reliable operation of the MediaWiki application platform, the surrounding ecosystem of microservices, and their dependencies (Memcached, Redis, Kafka, etcd, …)

    • Perform platform transformations and migrations towards modernized infrastructure (HHVM to Zend PHP7, bare metal deployments to Kubernetes clusters, active/active multi-data center support, etc.)

    • Bring your creativity to improve our current infrastructure and introduce new automation where needed

    • Support new code/feature deployments when required

    • Troubleshoot, debug and follow-up on emerging issues in our application stack and its surroundings

    • Perform day-to-day operational/DevOps tasks on Wikimedia’s wider public facing infrastructure (deployment, maintenance, configuration, troubleshooting), as well as reduction of manual, repetitive, automatable tasks (toil).

    • Implement and utilize configuration management and deployment tools (Puppet, Kubernetes)

    • Assist in the architectural design of new services and making them operate at scale

    • Monitoring of systems, services and service clusters, optimization of performance and resource utilization

    • Incident response, diagnosis and follow-up on system outages or alerts across Wikimedia’s production infrastructure

    • Share our values and work in accordance with them

    Qualifications

    • 3+ years experience in an SRE/Operations/DevOps role as part of a team
    • Experience in supporting complex web applications running highly available and high traffic infrastructure based on Linux
    • Comfortable with configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.), and modern observability infrastructure (monitoring, metrics and logging)
    • Aptitude for automation and streamlining of tasks
    • Comfortable with shell and scripting languages used in an SRE/Operations engineering context (e.g. Python, Go, Bash, Ruby, etc.)
    • Good understanding of Linux/Unix fundamentals and debugging skills
    • Strong English language skills and ability to work independently, as an effective part of a globally distributed team
    • B.S. or M.S. in Computer Science or equivalent in related work experience

    Pluses

    • Experience managing MediaWiki installations is a major plus
    • Track record of open source contributions is highly appreciated
    • Experience running PHP/LAMP stack applications is a plus, especially in geographically distributed environments
    • Familiarity  with modern distributed container cluster management systems (Kubernetes, Docker Swarm, Mesos, …)
    • Low level systems troubleshooting and debugging (CPU/memory profiling, C/C++ experience, in-depth Linux knowledge)
    • Experience with advanced distributed storage and database systems (Swift, Ceph, Cassandra, etc.)
    • Familiarity with RFC2549 or similar protocols

    The Wikimedia Foundation is… 

    …the nonprofit organization that hosts and operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge, free of interference. We host the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive financial support from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

    The Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply

    U.S. Benefits & Perks*

    • Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)
    • The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, baby sitting, continuing education and much more
    • The 401(k) retirement plan offers matched contributions at 4% of annual salary
    • Flexible and generous time off - vacation, sick and volunteer days, plus 19 paid holidays - including the last week of the year.
    • Family friendly! 100% paid new parent leave for seven weeks plus an additional five weeks for pregnancy, flexible options to phase back in after leave, fully equipped lactation room.
    • For those emergency moments - long and short term disability, life insurance (2x salary) and an employee assistance program
    • Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses
    • Telecommuting and flexible work schedules available
    • Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax
    • Great colleagues - diverse staff and contractors speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people

    *Eligible international workers' benefits are specific to their location and dependent on their employer of record

    More information

    Wikimedia Foundation website

    Wikimedia Foundation blog

    Annual Report - 2017

    Wikimedia 2030

Older
  • O'Reilly Auto Parts
    Preferred: (GMT-06:00) Central Time

    Have you ever heard of O, O, O, O'Reilly Auto Parts…Ow?! This is not your standard System Engineer position and we are not your standard brand! We are the dominant auto parts retailer in all our market areas.

    Our infrastructure teams work on projects adding directly to the O’Reilly Auto Parts bottom line and we are looking for exceptional Engineers and Admins to help us succeed! Some of the tools we use to implement our projects are Linux, Puppet, Git, Jenkins, Ansible, and other open source tools and technologies. We also utilize collaboration tools such as Jira and Confluence.

    What we look for in our Team Members:

    • Love solving complex problems related to serving our customers better – both internal & external customers
    • Enjoy working with teams
    • Senior level experience with linux and automation
    • Experience with documentation
    • An ambition to always learn and grow

    About our team:

    • We are a “work family”! We have fun together and support each other
    • We respect a healthy work-life balance
    • We are responsible for maintaining our linux infrastructure which consists of over 2,000 servers
    • The team keeps open communication through different outlets – video conferencing, team messaging applications, and daily stand-up meetings
    • Our managers really value collaboration between team members and encourage them to bring forth creative problem-solving ideas from both a technical and functional aspect

    Growth within our teams at O’Reilly Auto Parts:

    • We have several career paths, whether you want to be a supervisor, manager, or architect – there’s a documented growth plan to help you follow the path you choose
    • We want to grow our people – we help to make you better by providing training for both technical and professional development
    • We look to promote from within – O’Reilly is diligent to promote from within our organization with qualified team members