Infrastructure as Code: Lessons from the Trenches
Infrastructure as Code promises that you can manage your infrastructure the same way you manage application code — versioned, reviewed, tested, deployed. The promise is real. The execution is harder than it sounds.
State management is the first thing to get right. Terraform state contains your entire infrastructure mapping. Lose it, and Terraform thinks everything needs to be recreated. We store state in S3 with DynamoDB locking and versioning enabled. Every plan and apply runs in CI, not on a developer's laptop.
Modules should be opinionated. A good module encapsulates a decision: this is how we run a service, this is how we set up a database. It should have sensible defaults and a small number of configuration knobs. Too many knobs means every deployment is different, and you lose the consistency that IaC is supposed to provide.
Plan output is a review artifact. Every Terraform plan gets posted as a comment on the PR. The reviewer checks for unexpected changes — a security group opening port 22 to the world, a database being destroyed and recreated. If the plan is clean, the apply runs automatically after merge.
Secrets never touch the plan. Terraform state can contain sensitive values. We use Vault for secrets and pass them as environment variables at apply time. The state file is encrypted at rest and access is audited.
The goal of IaC is to make infrastructure changes boring. A boring change is a safe change. If your Terraform PRs are exciting, something is wrong.