System Engineer - Cloud Operations (Sunnyvale, CA)

Aerohive is seeking a person to join our Tech Operations team as a Cloud Ops System Engineer.  This individual will be responsible for setup, maintain and monitor production servers and applications on 24x7 basis. This individual will also be responsible for production system design, review capacity utilization, and perform capacity planning and expansion. This individual must have strong troubleshooting and problem solving skills, including application and system/network-level troubleshooting ability.

Responsibilities:

  • Build new servers, install/setup Linux OS and application stacks for production systems.
  • Setup, maintain and monitor production servers and applications on 24 x 7 basis.
  • Review capacity utilization, perform capacity planning and expansion.
  • Manage production system, application and network security, firewalls and policies.
  • Participate in system and infrastructure design, implementation, deployment and maintenance.
  • Participate in monitoring system design, implementation, deployment and maintenance.
  • Evaluate and develop tools to automate the deployment, administration and monitoring of a large-scale Linux environment.
  • Troubleshoot, diagnose and fix production application and system issues.
  • Participate in a 24x7 on-call duty rotation.

Basic Qualifications:

For this position, you must have:

  • A degree in computer science or a related field.
  • Proven experience running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers and hundreds of machines.
  • 3-5 years’ experience in Linux systems management and scripting.
  • Familiarity with SQL language and relational databases such as Oracle and MySQL.
  • Ability to learn technical concepts quickly with a strong sense of urgency.
  • Ability to write scripts in an administrative language (Python, Perl, or shell)
  • Experience with web-based application software management such as Apache, Tomcat.
  • Strong troubleshooting and problem solving skills, including application and system/network-level troubleshooting ability.
  • Familiarity with various High Availability, Backup and Disaster Recovery strategies and architectures.
  • Strong verbal and written English skills

Preferred Qualifications:

The ideal candidate will have:

  • Designing and/or implementing system health and performance monitoring tools for 24x7 environments.
  • Experience in managing systems in public cloud environment such as Amazon AWS.
  • Experience in the design and implementation of automation of operational tasks.
  • Exposure to database development or administration, MySQL or PostgreSQL preferred.
  • Solid grasp of networking fundamentals, including load balancers, switches and routers.
  • Performance tuning, especially Tomcat/Apache servers, and PostgreSQL/MySQL databases.
  • Exposure to application and infrastructure security.
  • Excellent communications skills, team player.