AI Performance Engineer - Hp - Houston

Job description

AI Performance Engineer



Job Description:



We are looking for an experienced performance engineer/technician/administrator that has experience with tuning server performance for a variety of Machine and Deep Learning workloads running on Linux and other platforms. The person in this role needs to have expertise with respect to installing and configuring Industry Standard Server architecture components including servers, storage and networking necessary for provisioning Deep and Machine Learning frameworks and applications. The person in this role will be responsible for investigating workload performance by capturing and analyzing workload data for characterization with the goal of tuning workload performance through software and hardware configuration changes or other changes to the system. They will work with customers, partners, and internal IT to characterize and tune the performance of their workloads. This person will need to have strong research skills and the ability to analyze data, build ML and DL models to be run on infrastructure as well as support customer models. They will work with internal software development team to capture workload telemetry and optimize performance of AI software on HPE systems. The person in this role will be familiar with Linux kernel mode entry points and trace points. They must have strong Linux experience and must be familiar with a variety of industry tools for capturing and analyzing the performance characteristics of a server workload. This person must understand performance and capacity of machine and deep learning workloads to be tested within our lab. They must be able to work independently to ramp up on new industry workloads and to develop research, white papers and reference architectures for AI on HPE systems. Experience troubleshooting applications composed of multiple component workloads is highly desired. Experience with Hadoop, Spark, Caffe, Caffe2, TensorFlow, Memcached or other modern workloads for Big Data, Deep Learning, Machine Learning and Artificial Intelligence is also highly desired. Strong verbal and written communication skills are also strongly desired.

Responsibilities:

· Installs and configures complex IT infrastructure components (servers, storage, network)
· Develop software scripts and configurations for automating deployment
· Performs system level analysis of server workloads on various HPE platforms running DL and ML code to include accelerated hardware
· Writes white papers and other guidance documents for AI workload and model selection
· Captures and reviews system performance data, logs, traces to understand workload behavior
· Develops software and scripts that help analyze AI workload performance data
· Communicates well with external partners and customers
· Works with ISV and IHV engineers in optimizing systems and resolving performance issues
· Documents and reports issues discovered when testing and evaluating the systems
· Communicates project status and concerns to management in a timely manner
· Provides guidance to less-experienced staff members

Education and Experience Required:

· PhD or similar level educational experience in STEM field or Computer Science
· Experience working with containers and distributed deep learning neural networks
· Experience working with High Performance Computing Servers and HPE Servers
· Experience working with Weka I/O, NFS and Lustre File Systems
· C, C++ or functional language programming knowledge is a plus
· Experience with InfiniBand technology and Mellanox IB Routing and Switching is a plus
· Strong statistical or mathematical analysis skills

Knowledge and Skills:

· Must have experience with Linux command-line on multiple Linux distributions
· Python programming ability is required; knowledge of Golang or willingness to learn is a plus
· Experience with machine and deep learning frameworks – must be able to develop models and create programs using available frameworks.
· Experience with data analysis and statistical modeling techniques
· Experience with conducting and completing research in AI
· Must have strong problem analysis and problem-solving skills
· Experience with one or more software development or scripting languages
· Experience working with complex., multi-layer software systems in data science
· Experience using software profiling tools to analyze software performance
· Experience using tools to create models for AI
· Aptitude for self-learning; Learns new concepts quickly
· Excellent written and verbal communication; mastery in English.
· Able to work well in a team environment and perform well under pressure
· Experience with tuning system performance in a benchmarking environment
· Experience with various hypervisor platforms
· Experience with GPU accelerator cards and software stack
· Experience with FPGA accelerators and software stack

Job:
Engineering

Job Level:
Expert



Hewlett Packard Enterprise is EEO F/M/Protected Veteran/ Individual with Disabilities.



HPE will comply with all applicable laws related to the use of arrest and conviction records, including the San Francisco Fair Chance Ordinance and similar laws and will consider for employment qualified applicants with criminal histories.

Offers “Hp”

Job description