The Systems Administrator will be responsible for the overall infrastructure design and implementation of a highly-available web-based system supporting multiple applications using Amazon Web Services (AWS). This individual will be responsible for effective architecture design, provisioning, installation/configuration, operation, and maintenance of cloud infrastructure and software using AWS. This individual will participate in technical research and development to enable continuing innovation within the infrastructure. This individual will ensure that cloud infrastructure, operating systems, software systems, and related procedures adhere to all applicable security regulations.
This individual will assist project teams with technical issues in the Initiation and Planning phases of development. These activities include the definition of needs, benefits, and technical strategy; research & development within the project life-cycle; technical analysis and design; and support of operations staff in executing, testing and rolling-out the solutions.
Qualified applicants must be detail-oriented, result-driven individuals who can take ownership of a project and can work well without direct oversight. They must have good interpersonal skills, as well as good client interaction skills.
Design overall architecture for highly-available AWS system using EC2, RDS, S3, and DynamoDB services.
Provision new / rebuild existing cloud servers and configure services, settings, directories, storage, permissions, etc. in accordance with standards and project/operational requirements.
Develop and maintain installation, configuration and operations procedures.
Manage team of infrastructure specialist to assist in implementation and support of systems.
Perform regular system monitoring, verifying the integrity and availability of all cloud systems, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups.
Perform regular security monitoring to identify any possible intrusions.
Provide Tier III/other support per request from various constituencies. Investigate and troubleshoot issues.
Repair and recover from system or software failures. Coordinate and communicate with impacted constituencies.
Install, configure and manage network services to support the project
Install, configure and support Operating Systems - Redhat/CentOS Linux
Responsible for cloud infrastructure and software, including maintenance, support and updates
Maintain and update Infrastructure documentation
Provide assistance with scripts (writing, troubleshooting) to accomplish system administration tasks as needed
Provide general system administration support
Participate in deployment and configuration management activities
Patch management and firmware evaluation, recommendations and reports, assist and advice in creation of standardized build templates
Assists the technical team with systems analysis, system and application maintenance, capacity planning, and diagnostics, making recommendations and providing reports. Activities may include:
Operating systems and network troubleshooting and problem reports especially problems that would be considered advanced or non-routine problems.
Troubleshoot system and software issues, especially non-routine issues, provide recommendations for fix.
Performance Analysis, troubleshooting, and problem reports
Provide Capacity Planning Analysis and Reports
Assists with the configuration and testing of additional inter-domain networking and alternate path and/or dynamic reconfiguration. Activities may include:
Configure and test any inter-domain networking
Configure and test alternate pathing and/or dynamic reconfiguration
Compilation and documentation of inter-domain networking, alternate pathing and/or dynamic reconfiguration activities in an operational guide.
Provides expertise to address architectural and process initiatives. Activities may include:
Assist with the planning, evaluation and implementation of enterprise systems re-architecture/system refresh.
Evaluate and recommend new environment architecture including cluster redesign, virtualization options, filesystem design.
Address backup architecture and the use of technologies such as clones, snaps, and so forth
Assist with customer server performance issues
Serve as lead technical liaison in technical meetings
Develops and maintains detailed and accurate. Documentation may be required for operational procedures, troubleshooting aids, and technical analyses for products, features, and capabilities.
Requirements and Experience Guidelines:
Must be a U.S. Citizen or U.S. Legal Permanent Resident
AWS System administration experience required
Demonstrated understanding of System Administration for AWS Unix servers
Experience with RDS, and/or DynamoDB services desired
Network administration experience desired
Experience with managing Tomcat servers in AWS environment
Knowledge of web technologies, prefer ably Apache/Tomcat
Understanding of backup and disaster recovery processes and configuration
Strong troubleshooting skills to resolve Infrastructure related problems
Experience with source control, unit test, continuous integration tools will be a plus
Ability to learn new technologies quickly
Demonstrated excellent communication skills including the ability to effectively communicate with internal and external customers
Ability to support clustered systems in an enterprise production 24x7 environment
At least 5 years of experience as configuring, installing, and supporting AWS infrastructure
At least 5 years of experience configuring, testing, evaluating network needed to support cluster configurations.
At least 5 years evaluating and deploying software tools in an AWS production environment.
Excellent project management skills in order to plan for upgrades to the cluster environment.
Excellent problem management and troubleshooting skills.
Excellent verbal and written communication skills to be able to ascertain user requirements and prepare documentation.
Excellent customer interface skills. Demonstrated ability to deal with customers in a challenging environment.
At least 5 years of experience in cluster system design and ability to provide system architecture advice.
CloudFormation experience desired.
Bachelor’s degree in Computer Science, Engineering, or related field required
EXTRA NOTES Regarding the Opportunity:
1. What stage are they in with the project? Designing and planning or Mid-way?
Answer: Designing and planning. This is the FirstNet project that went on hold then kicked off in March.
2. How many users are involved?
Answer: A lot! Trying to find a number or estimate however the FirstNet website doesn’t publish that. This is a nationwide emergency system. It is available to all first responders (EMT, Firefighters, Paramedics, Police, National Guard) during major
emergencies (hurricane, tornado, etc.). So it will service a lot of people.
3. What are they trying to accomplish with this system? Build?
Answer: https://www.firstnet.gov/about . In essence we’re partnering with AT&T to build this system. They are providing the infrastructure and we are providing the strategy and development.
4. How many environments do they have? and how large is the team the potential systems admin/ infrastructure manager working with?
Answer: Our team is large. Split into 4 different groups (each 4-10 people) in Arlington and New York. This is a long term project with plenty of runway.
|Salary||0 to 0|
|Years of Experience ||5+ to 10 years|
|Minimum Education ||-|
|Willingness to Travel||-|
|Hours per week||0|