Lead Platform & DevOps Engineer

Maksym
Puzik

Platform · MLOps · DevSecOps · Remote

I build the platforms that let engineering teams ship faster, safer, and cheaper. 9+ years across AWS, GCP, and Azure — from bare-metal networking to ML pipeline orchestration. I've cut cloud bills by up to 50%, reduced release cycles by 45%, and built observability systems processing 4 trillion datapoints — while keeping security and compliance non-negotiable.

0+
Years of experience
0T+
Datapoints stored
0%
CI/CD cycle reduction
0%
Cloud cost savings
00

About

I build platforms that let engineering teams move fast — without breaking things, burning cloud budget, or keeping security teams up at night.

  • 9+ years from ISP networking and bare-metal Linux to multi-cloud Kubernetes platforms serving millions of users across fintech, eCommerce, and big data domains
  • I treat infrastructure as a product — with internal users, feedback loops, SLOs, and quarterly roadmaps. Developer experience is a metric I optimize, not a feeling
  • My sweet spot is the intersection of platform engineering, ML/AI infrastructure, and FinOps — rare to find all three in one engineer
  • I've driven DORA metric improvements end-to-end: deployment frequency up, change failure rate below 2%, MTTR under 10 minutes
  • Security is never an afterthought — SOC 2, shift-left DevSecOps, and least-privilege IAM are baked into every design decision from day one
  • Led and mentored teams of engineers while staying hands-on — I write Terraform, review PRs, and debug production incidents alongside the team
Internal Developer Platform
Self-service golden paths. Dev teams deploy without waiting on platform. DevEx is the product.
ML / AI Infrastructure
End-to-end ML pipeline orchestration — from feature ingestion to model serving at scale.
Security by Default
SOC 2 readiness, shift-left DevSecOps, least-privilege IAM — non-negotiable from day one.
FinOps & Cost Engineering
30–50% cloud cost reduction. Real savings with P&L accountability, not just rightsizing tips.
01

Industries

Platform engineering problems are universal — compliance requirements, scaling challenges, and cost pressures show up everywhere. Here's where I've operated:

Fintech
SOC 2 compliance, audit logging, secrets management, PCI-adjacent architectures. High-stakes uptime where downtime means regulatory exposure.
SOC 2IAMVaultAudit Logs
Big Data & ML
Kafka streaming pipelines, Spark batch processing, Airflow orchestration, MLflow model registry. Infrastructure that data teams actually want to use.
KafkaSparkMLflowAirflow
eCommerce
Traffic spike handling, auto-scaling, CDN architecture, zero-downtime deployments during peak events. Black Friday is not the time to discover your platform can't scale.
AutoscalingCDNBlue-GreenSpot
Entertainment & Streaming
Low-latency delivery, global CDN, real-time video infrastructure, high-availability for live events. Millions of concurrent users don't wait for deploys.
CDNMulti-RegionHALow Latency
Telecom & ISP
BGP routing, VoIP infrastructure migration to cloud, 99.999% uptime SLAs, IPsec VPN connectivity across enterprise clients. Where networking fundamentals actually matter.
BGPVoIPIPsec99.999%
SaaS & Scale-ups
Growing from scrappy pipelines to enterprise-grade platforms. Kubernetes migrations, GitOps rollouts, IDP adoption — without stopping the business.
KubernetesGitOpsIDPTerraform
02

Success Stories

−50% cloud cost
Cutting Cloud Spend in Half Without Touching SLAs
Multi-cloud environment · AWS + GCP · Fintech workloads
Infrastructure costs were growing faster than revenue. Ran a systematic FinOps audit: Spot Instance migration for stateless workloads, Savings Plans for baseline compute, rightsizing 200+ EC2 instances, and architectural consolidation of redundant services. Zero SLA degradation, zero prod incidents during migration.
30–50% cost reduction 0 SLA breaches 3 cloud providers
−45% release cycle
From Weekly Releases to Daily Deploys
Multi-team product org · GitLab CI → GitHub Actions + ArgoCD
Release cycles were slow and error-prone — manual steps, inconsistent environments, and deployment anxiety. Rebuilt pipelines end-to-end: standardized environments with Terraform, introduced progressive delivery (blue-green + canary), rolled out GitOps with ArgoCD. Change failure rate dropped below 2%, MTTR under 10 minutes.
45% faster releases <2% failure rate 70% less manual work
4T+ datapoints
Building Observability That Actually Works at Scale
High-throughput platform · VictoriaMetrics + Prometheus + Loki + Grafana
The existing monitoring was fragmented — infra metrics in one place, app metrics in another, business metrics nowhere. Built a unified observability stack processing 4M+ metrics/min with 4+ trillion datapoints stored, automated incident routing to PagerDuty, and custom dashboards for engineering and business stakeholders alike.
4M+ metrics/min 4T+ datapoints 1 unified stack
SOC 2 ready
Shifting Security Left Across the Entire SDLC
Multi-team engineering org · Fintech compliance requirements
Security was reactive — vulnerabilities found in prod, secrets occasionally committed to Git, IaC configs drifting from policy. Embedded tfsec, Checkov, Snyk, and TFLint as mandatory CI/CD gates. Standardized secrets handling with Vault + AWS Secrets Manager. Centralized SSO via SAML 2.0/OAuth2. Supported full SOC 2 readiness audit end-to-end.
100% IaC policy gates SOC 2 compliant 0 secrets in Git
ML team unblocked
ML Platform That Data Scientists Actually Use
Big Data domain · Airflow MWAA + MLflow + Argo Workflows + Kafka + Spark
Data science team was bottlenecked on infra — model training jobs competing for resources, no reproducible environments, retraining pipelines requiring manual intervention. Built end-to-end ML infrastructure: managed Airflow for orchestration, MLflow for experiment tracking and model registry, Argo Workflows for scalable training, Kafka + Spark for feature pipelines.
0 manual retrain steps Full experiment tracking Auto scaling training
RTO < 10 min
DR Strategy That Survives a Region Outage
Multi-region AWS architecture · Fintech & eCommerce
The DR plan existed on paper but had never been tested. Designed and implemented a real multi-AZ, multi-region architecture with automated failover, database replication, and runbook-driven recovery. Conducted regular DR drills. Achieved RTO under 10 minutes and near-zero RPO — tested, not estimated.
<10 min RTO ~0 RPO Tested not estimated
03

Experience

Apr 2019 – Present
Confidential
Remote · Fintech · Big Data · eCommerce · Entertainment
Lead Platform & DevOps Engineer
Current · 6 yrs
  • Platform Ownership: Designed and operated a multi-region, self-service internal platform on Kubernetes — enabling dev teams to deploy independently, reducing DevOps intervention by 25%
  • ML Infrastructure: Provisioned and operated end-to-end ML pipeline infrastructure (Airflow/MWAA, MLflow, Argo Workflows, Kafka, Spark) — enabling model deployment and retraining without platform bottlenecks
  • Observability at Scale: Built a high-throughput monitoring stack processing 4M+ metrics/min and storing 4+ trillion datapoints; unified infra, application, and business metrics with automated incident routing
  • Cost Engineering: Reduced cloud infrastructure costs by 30–50% across providers via Spot Instances, rightsizing, Savings Plans, and architectural refactoring — real money saved, not projections
  • CI/CD Transformation: Cut release cycle time by 45% and dropped deployment error rate below 2% by redesigning pipelines and introducing progressive delivery strategies
  • GitOps at Scale: Rolled out ArgoCD + Helm across all environments; reduced manual deployment actions by 70%, enabling auditable and reproducible releases
  • DevSecOps: Embedded tfsec, Checkov, Snyk, and TFLint into CI/CD as policy gates; standardized secrets handling with Vault + AWS Secrets Manager; supported SOC 2 readiness end-to-end
  • Infrastructure as Code: Automated 90% of infrastructure provisioning with Terraform + Terragrunt; zero manual environment setup across 3 cloud providers
  • Resilience & DR: Designed multi-AZ, multi-region architecture with RTO < 10 min and near-zero RPO; blue-green deployments as standard rollout strategy
  • Team Leadership: Led a team of 4 DevOps engineers; drove technical planning, quarterly roadmap, and mentoring — aligning platform work to business delivery goals
Jun 2016 – Apr 2019
Adelina Call Center
Kyiv, UA · On-site
Senior Systems Engineer
  • Cloud Migration: Migrated core VoIP platform to cloud infrastructure; achieved 99.999% uptime post-migration
  • Database Infrastructure: Built HA PostgreSQL clusters with replication, monitoring, and performance tuning for latency-sensitive workloads
  • Observability: Deployed full-stack Zabbix monitoring with a custom Telegram bot for real-time metric queries — zero-config alerting for ops team
  • Automation: Scripted provisioning and support workflows that automatically resolved ~50% of recurring Jira tickets
  • Networking & Security: Established IPsec VPN tunnels with external partners; managed AD, Exchange, and Linux server fleet
  • Virtualization: Operated VMware and XenServer HA clusters supporting critical business services
Mar 2016 – Jun 2016
Cosmonova LLC
Kyiv, UA · On-site
Network Administrator
  • BGP & Routing: Managed routing across ISP backbone; optimized BGP paths for performance and reliability across customer networks
  • Streaming Infrastructure: Improved availability and reliability of real-time video streaming services under production load
  • Security Hardening: Hardened public-facing perimeter; mitigated infrastructure-level threats in cloud environment
  • Mail & Linux Ops: Deployed internal mail server with spam filtering; tuned high-load Linux servers for stable production traffic
04

Skills

MLOps & AI Infrastructure
MLflowAirflow (MWAA)Argo WorkflowsKafkaApache SparkAWS Step FunctionsTensorFlow
Cloud Platforms
AWSAzureGCPIAMCost OptimizationSpot Instances
Containers & Orchestration
KubernetesDockerIstioLinkerd
Infrastructure as Code
TerraformTerragruntCloudFormationAnsiblePacker
GitOps & CI/CD
ArgoCDHelmGitHub ActionsGitLab CIJenkinsAzure DevOps
Monitoring & Observability
PrometheusGrafanaVictoriaMetricsELK StackLokiDatadog
Security & Compliance
SOC 2VaultGuardDutyWAFCheckovtfsecSnyk
Data & Streaming
KafkaApache SparkAirflow (MWAA)Argo WorkflowsMLflow
Networking
BGPIPsecOpenVPNDNSCDNLoad BalancingNginx
SSO & Identity
SAML 2.0OAuth 2.0OIDCKeycloakAzure ADPingFederate
Scripting & Databases
PythonBashPostgreSQLMySQLRedisDynamoDB
05

Education & Certifications

Education

2016 – 2019
National Aviation University
Bachelor's · Computer Science & IT · Cybersecurity · Kyiv
2012 – 2016
Industrial Economics College, NAU
Associate · Engineering · Electronic Engineering · Kyiv

Languages

English
C1
Ukrainian
Native
Russian
Native
Spanish
A2

Certifications & Training

2025
Mathematical Foundations of Machine Learning
Udemy
2024
TensorFlow for Deep Learning Bootcamp
Udemy
2022
Designing & Implementing Microsoft DevOps Solutions
Microsoft
2021
Azure Administrator Associate AZ-104
Microsoft
2021
Azure Fundamentals AZ-900
CloudGuru
2021
Kubernetes Certified Administrator (CKA) Prep
Udemy
2019
Architecting with Google Cloud Platform
Coursera
2017
CCNA Security
Cisco · Offline
2016
CCNA Routing & Switching + CCENT
Cisco · Offline