Description
TMX Finance Senior Manager, Platform Engineering / IT Capacity Management and Monitoring Jul' 14 - Mar'17 * Managed multi-discipline infrastructure team encompassing network virtualization OS middleware database and DevOps/automation supporting a Point of Sale application across over 1400 retail locations across 2 data centers providing over 99% uptime across a 2.5 year period - Produced Infrastructure Operational Capacity Reports and presented them to executive leadership and engineering teams - Identified and implemented opportunities for virtual downsizing application tuning software licensing and hardware reallocation to maximize utilization of physical hardware assets without sacrificing performance reducing hardware and software expenses by an average of $500k per year - Developed and maintained a 3.5TB infrastructure utilization and performance data warehouse collecting data from over 15 different sources into a unified schema for aggregation reporting and analysis - Developed hardware requirement forecast plans based on business volume projections and trending application utilization data providing hardware capacity as needed while staying under budget - Developed ad hoc reports for all areas of infrastructure and application performance from application SLA compliance to storage utilization to database backup completion rates - Supported SOX compliance and governance efforts through reports and audit reviews of collected monitoring data and events - Implemented an automated framework for application and infrastructure incident management reducing manual monitoring and systems checks by over 80% - Implemented improved incident alerting and notification system leading to 70% reduction in incident response and resolution time - Designed and implemented disaster recovery and business continuity processes for Point of Sale applications meeting RTO objectives of 2 hours and RPO targets of 1 hour - Conducted resource utilization assessments for all application load testing identifying bottlenecks resource issues and performance impacts prior to release deployment JSP - Designed and implemented capacity consumption monitoring capability that identified utilization issues proactively and provided trend analysis for future behavior * Managed hardware provisioning and forecasting for hardware capacity - Responsible for troubleshooting/performance management for applications and infrastructure across multiple platforms - Created proprietary software solutions to collect and process data in Perl, VB.NET, and Java - Coordinated with application testing center to properly size applications for production launch, often reducing original hardware estimates by 70% or more while causing zero resource shortfalls and Java-based applications