DevOps

From Zero to Production-Style Kubernetes on AWS with K3s and GitOps

How I built a production-style Kubernetes platform on AWS EC2 using K3s, Terraform, Argo CD, ingress-nginx, cert-manager, Prometheus, Grafana, and Loki, including a real ingress incident and recovery.

February 19, 20263 min read16 views

From Zero to Production-Style Kubernetes on AWS with K3s and GitOps

Article

🚀 From Zero to Production-Style Kubernetes on AWS with K3s and GitOps

I built this project to simulate how a real platform team operates: declarative infrastructure, GitOps-based delivery, automated HTTPS, full observability, and incident-driven improvement.

This is not a toy cluster — it is a cost-conscious, production-style Kubernetes platform running on AWS EC2 using K3s.

Platform architecture for the project — Platform architecture (AWS EC2 + K3s + Argo CD + ingress + observability).

🏗 What I Built

This platform combines:

Terraform-provisioned AWS EC2 infrastructure
K3s Kubernetes cluster
Argo CD (App-of-Apps GitOps model)
ingress-nginx for HTTP/HTTPS routing
cert-manager + Let’s Encrypt for automated TLS
Prometheus + Alertmanager + Grafana for metrics
Loki + Promtail for centralized logging

The repository is organized by responsibility:

infra/terraform/aws-k3s → Infrastructure provisioning platform/ → Cluster platform services (GitOps managed) apps/ → Application workloads envs/dev → Development environment envs/prod → Production environment

This structure keeps infrastructure, platform, and workloads cleanly separated while supporting multi-environment strategy.

🔁 GitOps Delivery Model

The deployment workflow is deterministic and drift-resistant:

I push changes to GitHub.
Argo CD detects configuration drift.
Argo CD reconciles the desired state into the cluster.
Manual changes are automatically reverted unless committed to Git.

Argo CD applications synced and healthy.

This enforces Git as the single source of truth and prevents configuration drift.

🌐 Ingress and TLS Automation

I use ingress-nginx to expose services and cert-manager to automate certificate issuance and renewal via Let’s Encrypt.

Ingress-NGINX managed by Argo CD — Ingress-NGINX deployment managed through GitOps.

Kube Prometheus Stack managed by Argo CD — Kube Prometheus Stack managed through GitOps.

cert-manager certificate lifecycle — cert-manager handling certificate issuance.

TLS proof for public endpoint — TLS validation proof from the live endpoint.

The result is fully automated HTTPS without manual certificate management.

📊 Observability Stack

Operational visibility is implemented through:

Prometheus – scraping cluster and workload metrics
Grafana – dashboards and monitoring visualization
Loki – centralized log aggregation
Promtail – log shipping from workloads

Grafana cluster dashboard — Grafana dashboard for platform monitoring.

Loki log exploration in Grafana — Centralized logs with Loki in Grafana.

This enables metric-based monitoring and log-based troubleshooting from a single interface.

🔥 Real Incident That Improved the Platform

While migrating ingress-nginx to full GitOps management, I experienced a production-style outage.

Symptoms

TLS certificate mismatch
Public endpoint unreachable

Root Causes

K3s default Traefik was still serving traffic.
ingress-nginx was configured as NodePort.
After disabling Traefik, host ports 80/443 were not properly owned.
A manual fix temporarily worked but was reverted by Argo CD because Git still defined the old state.

Durable Fix

Permanently disable Traefik in K3s configuration.
Change ingress-nginx service type in Git to LoadBalancer.
Allow Argo CD to reconcile the corrected desired state.

This incident reinforced the core GitOps principle:

If it is not in Git, it is not a durable fix.

⚖️ Current Trade-offs

To balance realism and cost:

Single-node cluster (cost-efficient, not HA)
Some bootstrap steps remain manual (initial K3s + Argo CD installation)
Environment overlays are evolving

These are deliberate engineering trade-offs, not oversights.

🔭 What’s Next

Planned improvements:

Add CI validation:
- yamllint
- terraform validate
- Kubernetes schema validation
Fully GitOps-manage cert-manager installation
Expand environment overlays for workloads
Add policy enforcement and security hardening

🧠 What This Project Demonstrates

Infrastructure as Code with Terraform
Kubernetes cluster operations (K3s)
GitOps architecture with Argo CD
Ingress and TLS automation
Observability integration (metrics + logs)
Incident debugging and architectural correction
Production-style operational discipline

🏁 Final Thoughts

This project helped me practice the parts of Kubernetes work that matter most in production:

Platform design
Delivery workflows
Failure modes
Observability
Recovery through correct architecture

Building the platform was valuable.

Debugging it under real constraints is what made it production-grade.

Back to Blog Next articleAdvero