Infrastructure as Code

Structuring Terraform for Scale: Monorepo vs. Polyrepo

Updated By Zak Kann

Key takeaways

  • Monorepo centralizes all Terraform in one repository with shared modules, enabling atomic changes and consistent tooling, but requires discipline around blast radius and CI/CD complexity
  • Polyrepo splits infrastructure by service or team with isolated state, providing clear ownership and independent deployment velocity, but creates module duplication and version drift
  • Team size is the primary decision factor: monorepo works for 2-15 engineers with strong collaboration; polyrepo scales to 50+ engineers with autonomous teams
  • Hybrid approaches (shared modules in monorepo, workspaces in polyrepo) can provide the best of both worlds for mid-sized organizations
  • Migration between strategies is possible but requires careful planning around state files, CI/CD pipelines, and team coordination

The Terraform Organization Problem

You started with one main.tf file. It worked great. Then you added staging. Then production. Then VPC, RDS, ECS, Lambda, CloudFront... Now you have:

  • 47 .tf files in one directory
  • 12 developers committing to the same repo
  • CI/CD taking 23 minutes to plan all resources
  • A production incident caused by a staging change
  • No clear ownership of infrastructure components

The question: Should you keep everything in one repository (monorepo) or split into multiple repositories (polyrepo)?

The real answer: It depends on your team size, deployment model, and organizational structure.

Monorepo: Everything in One Place

Definition: All Terraform code lives in a single repository with shared modules and centralized tooling.

Typical Structure

terraform-infrastructure/
├── .github/
│   └── workflows/
│       ├── plan.yml
│       └── apply.yml
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── backend.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   │   ├── main.tf
│   │   ├── backend.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── production/
│       ├── main.tf
│       ├── backend.tf
│       ├── variables.tf
│       └── terraform.tfvars
├── modules/
│   ├── vpc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── ecs-service/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── README.md
│   ├── rds/
│   └── lambda/
├── scripts/
│   ├── validate.sh
│   ├── plan-all.sh
│   └── apply.sh
└── README.md

Advantages

1. Atomic Changes Across Environments

# modules/ecs-service/main.tf
resource "aws_ecs_task_definition" "app" {
  family = var.service_name
 
  container_definitions = jsonencode([{
    name      = var.service_name
    image     = var.docker_image
    cpu       = var.cpu
    memory    = var.memory
 
    # Bug fix: Add health check (affects all environments simultaneously)
    healthCheck = {
      command     = ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
      interval    = 30
      timeout     = 5
      retries     = 3
      startPeriod = 60
    }
  }])
}

When you fix a bug in the module, all environments get the fix in one commit. No need to propagate changes across 5 repositories.

2. Shared Modules with Single Source of Truth

# environments/production/main.tf
module "api_service" {
  source = "../../modules/ecs-service"
 
  service_name = "api"
  docker_image = "api:v1.2.3"
  cpu          = 1024
  memory       = 2048
 
  vpc_id            = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
}
 
module "worker_service" {
  source = "../../modules/ecs-service"  # Same module, guaranteed consistent
 
  service_name = "worker"
  docker_image = "worker:v1.0.5"
  cpu          = 512
  memory       = 1024
 
  vpc_id            = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
}

3. Centralized Tooling and Standards

# .github/workflows/plan.yml
name: Terraform Plan
 
on:
  pull_request:
    paths:
      - 'environments/**'
      - 'modules/**'
 
jobs:
  changed-environments:
    runs-on: ubuntu-latest
    outputs:
      environments: ${{ steps.detect.outputs.environments }}
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
 
      - name: Detect changed environments
        id: detect
        run: |
          # Only plan environments that changed
          CHANGED=$(git diff --name-only origin/main...HEAD | grep 'environments/' | cut -d'/' -f2 | sort -u | jq -R -s -c 'split("\n")[:-1]')
          echo "environments=$CHANGED" >> $GITHUB_OUTPUT
 
  plan:
    needs: changed-environments
    if: needs.changed-environments.outputs.environments != '[]'
    runs-on: ubuntu-latest
    strategy:
      matrix:
        environment: ${{ fromJson(needs.changed-environments.outputs.environments) }}
 
    steps:
      - uses: actions/checkout@v3
 
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.6.0
 
      - name: Terraform Init
        working-directory: environments/${{ matrix.environment }}
        run: terraform init
 
      - name: Terraform Plan
        working-directory: environments/${{ matrix.environment }}
        run: |
          terraform plan -out=tfplan
          terraform show -json tfplan > plan.json
 
      - name: Post plan to PR
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('environments/${{ matrix.environment }}/plan.json', 'utf8');
            // Post plan summary to PR comment

4. Easy Refactoring

Renaming a module? One commit affects all usages:

git mv modules/ecs-service modules/ecs-fargate-service
find environments -type f -name "*.tf" -exec sed -i 's|modules/ecs-service|modules/ecs-fargate-service|g' {} +
git commit -m "Rename ecs-service module to ecs-fargate-service"

Disadvantages

1. Blast Radius Risk

One bad merge to main can affect all environments:

# Someone accidentally commits this to production/main.tf
resource "aws_security_group_rule" "allow_all" {
  type              = "ingress"
  from_port         = 0
  to_port           = 65535
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"]  # Oops, security vulnerability
  security_group_id = module.vpc.default_security_group_id
}

Mitigation:

# .github/workflows/apply.yml
- name: Require manual approval for production
  if: matrix.environment == 'production'
  uses: trstringer/manual-approval@v1
  with:
    approvers: platform-team
    minimum-approvals: 2

2. Long CI/CD Times

As the monorepo grows, planning all environments takes longer:

# Initial: 2 minutes
terraform plan (dev, staging, production)
 
# 6 months later: 15 minutes
terraform plan (dev, staging, prod, dr, sandbox-1, sandbox-2, ...)
 
# 12 months later: 45 minutes
terraform plan (10 environments × 20 modules each)

Mitigation: Selective planning based on changed files (shown in workflow above)

3. Merge Conflicts

10 engineers changing infrastructure simultaneously:

Auto-merging environments/production/main.tf
CONFLICT (content): Merge conflict in environments/production/main.tf
Automatic merge failed; fix conflicts and then commit the result.

Mitigation: Smaller, focused PRs and feature flags

4. Difficulty Enforcing Team Boundaries

Team A owns VPC, Team B owns ECS. But both can edit each other's code:

modules/
├── vpc/        # Team A
└── ecs/        # Team B (but Team A can still modify this)

Mitigation: CODEOWNERS file + required reviews

# .github/CODEOWNERS
/modules/vpc/** @team-networking
/modules/ecs/** @team-platform
/environments/production/** @team-platform @team-security

Polyrepo: Separated by Ownership

Definition: Infrastructure split across multiple repositories, typically by service, team, or functional area.

Typical Structure

# Repository: terraform-networking
terraform-networking/
├── modules/
│   ├── vpc/
│   └── transit-gateway/
├── dev/
│   ├── main.tf
│   └── backend.tf
├── staging/
└── production/

# Repository: terraform-ecs-api
terraform-ecs-api/
├── modules/
│   └── ecs-service/
├── dev/
├── staging/
└── production/

# Repository: terraform-ecs-worker
terraform-ecs-worker/
├── modules/
│   └── ecs-service/  # Duplicate of terraform-ecs-api module!
├── dev/
├── staging/
└── production/

# Repository: terraform-rds
terraform-rds/
├── modules/
│   └── postgres/
├── dev/
├── staging/
└── production/

Advantages

1. Clear Ownership and Autonomy

Each team owns their repository completely:

# terraform-ecs-api/.github/workflows/deploy.yml
name: Deploy API Service
 
on:
  push:
    branches: [main]
 
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Deploy to production
        run: |
          cd production
          terraform init
          terraform apply -auto-approve
        # No cross-team dependencies, no waiting for other teams' reviews

2. Independent Deployment Velocity

Team A can deploy 20 times/day without affecting Team B:

Team A (API): 20 deployments/day
Team B (Worker): 3 deployments/day
Team C (Networking): 1 deployment/week

# No coordination required, no merge conflicts

3. Blast Radius Isolation

Breaking change in one repo doesn't affect others:

# terraform-ecs-worker/production/main.tf
# This mistake only affects worker, not API or RDS
resource "aws_ecs_service" "worker" {
  desired_count = 0  # Accidentally scaled to zero
  # API service continues running normally
}

4. Easier Access Control

GitHub repository permissions per team:

terraform-networking:     @team-networking (admin)
terraform-ecs-api:        @team-backend (admin)
terraform-ecs-worker:     @team-backend (admin)
terraform-rds:            @team-database (admin), @team-backend (read)

Disadvantages

1. Module Duplication

Same ECS module copied across 5 repositories:

terraform-ecs-api/modules/ecs-service/     (version 1.2.3)
terraform-ecs-worker/modules/ecs-service/  (version 1.2.3)
terraform-ecs-cron/modules/ecs-service/    (version 1.1.0)  # Drift!
terraform-ecs-admin/modules/ecs-service/   (version 1.2.5)  # Different version

Solution: Publish modules to private Terraform Registry

# terraform-ecs-api/production/main.tf
module "api_service" {
  source  = "app.terraform.io/company/ecs-service/aws"
  version = "1.2.3"  # Centralized versioning
 
  service_name = "api"
  docker_image = "api:v1.0.0"
}

2. Cross-Repository Dependencies

API service needs VPC ID from networking repo:

# terraform-ecs-api/production/main.tf
# How do we get the VPC ID from terraform-networking?
 
# Option 1: Hardcode (brittle)
variable "vpc_id" {
  default = "vpc-abc123"
}
 
# Option 2: Data source (better)
data "aws_vpc" "main" {
  tags = {
    Name = "production-vpc"
  }
}
 
# Option 3: Remote state (best)
data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "terraform-state"
    key    = "networking/production/terraform.tfstate"
    region = "us-east-1"
  }
}
 
module "api_service" {
  source = "..."
 
  vpc_id = data.terraform_remote_state.networking.outputs.vpc_id
}

Problem: Tight coupling between repositories via remote state

3. Inconsistent Tooling

Each repo can drift in standards:

terraform-networking:  Terraform 1.6.0, using tflint
terraform-ecs-api:     Terraform 1.5.5, using tfsec
terraform-rds:         Terraform 1.4.0, no linting

Solution: Shared CI/CD templates or organization-level GitHub Actions

4. Difficult Cross-Cutting Changes

Upgrading AWS provider across 15 repositories:

# Must update in 15 separate PRs
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"  # Want to upgrade to 5.0
    }
  }
}

Decision Framework

Choose Monorepo If:

  • Team size: 2-15 engineers
  • Strong collaboration culture (daily standups, shared ownership)
  • Infrastructure changes affect multiple services simultaneously
  • You value consistency over autonomy
  • Deployment frequency: 5-20 deploys/week across all services
  • Your infrastructure is tightly coupled (shared VPC, shared RDS)

Example: Startup with 5 engineers managing API + worker + database

Choose Polyrepo If:

  • Team size: 15+ engineers across multiple teams
  • Teams operate autonomously (microservices, separate on-call)
  • Each service has independent infrastructure lifecycle
  • You value autonomy over consistency
  • Deployment frequency: 50+ deploys/day across all teams
  • Your infrastructure is loosely coupled (service mesh, separate VPCs)

Example: Scale-up with 40 engineers, 8 teams, 30 microservices

Hybrid Approach: The Middle Ground

Pattern 1: Shared Modules Monorepo + Workspaces Polyrepo

# Repository: terraform-modules (monorepo)
terraform-modules/
├── vpc/
├── ecs-service/
├── rds/
└── lambda/

# Repository: team-backend-infrastructure (polyrepo)
team-backend-infrastructure/
├── workspaces/
│   ├── api/
│   │   ├── dev/
│   │   ├── staging/
│   │   └── production/
│   └── worker/
│       ├── dev/
│       ├── staging/
│       └── production/
└── modules.tf  # References terraform-modules repo
# team-backend-infrastructure/modules.tf
module "ecs_service_module" {
  source = "git::https://github.com/company/terraform-modules.git//ecs-service?ref=v1.2.3"
}
 
# workspaces/api/production/main.tf
module "api" {
  source = "../../../modules.tf"
 
  service_name = "api"
  environment  = "production"
}

Pattern 2: Monorepo with Workspace Isolation

terraform-infrastructure/
├── shared-modules/
│   ├── vpc/
│   └── ecs/
├── teams/
│   ├── backend/
│   │   ├── api/
│   │   │   ├── dev/
│   │   │   ├── staging/
│   │   │   └── production/
│   │   └── worker/
│   ├── frontend/
│   │   └── cdn/
│   └── data/
│       └── pipeline/
└── .github/
    └── workflows/
        ├── team-backend.yml
        ├── team-frontend.yml
        └── team-data.yml
# .github/workflows/team-backend.yml
name: Backend Team Infrastructure
 
on:
  push:
    paths:
      - 'teams/backend/**'
      - 'shared-modules/**'
  pull_request:
    paths:
      - 'teams/backend/**'
 
# Backend team's changes only trigger their pipeline

Real-World Case Study: Migration from Monorepo to Polyrepo

Company: B2B SaaS, $10M ARR Team growth: 8 → 35 engineers over 18 months Problem: Monorepo CI/CD taking 40 minutes, 10+ merge conflicts/day

Migration Strategy

Phase 1: Identify Service Boundaries (Week 1-2)

# Current monorepo
terraform-infrastructure/
├── networking/      → Extract to terraform-networking
├── ecs-api/         → Extract to team-backend/terraform-api
├── ecs-worker/      → Extract to team-backend/terraform-worker
├── rds/             → Extract to team-database/terraform-rds
└── cloudfront/      → Extract to team-frontend/terraform-cdn

Phase 2: Extract Shared Modules (Week 3-4)

# Create terraform-modules repository
git clone terraform-infrastructure terraform-modules
cd terraform-modules
 
# Keep only modules/ directory
git filter-branch --subdirectory-filter modules -- --all
 
# Tag release
git tag v1.0.0
git push origin v1.0.0

Phase 3: Create Service Repositories (Week 5-8)

# For each service
git clone terraform-infrastructure terraform-networking
cd terraform-networking
 
# Keep only networking directory
git filter-branch --subdirectory-filter networking -- --all
 
# Update module references
find . -type f -name "*.tf" -exec sed -i 's|../../modules/|git::https://github.com/company/terraform-modules.git//|g' {} +
 
# Configure remote state dependencies
cat > production/networking.tf <<EOF
output "vpc_id" {
  value = aws_vpc.main.id
}
 
output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}
EOF
 
git add .
git commit -m "Extract networking to separate repository"
git push origin main

Phase 4: Update Remote State References (Week 9-10)

# terraform-ecs-api/production/main.tf
data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "company-terraform-state"
    key    = "networking/production/terraform.tfstate"
    region = "us-east-1"
  }
}
 
module "api_service" {
  source = "git::https://github.com/company/terraform-modules.git//ecs-service?ref=v1.0.0"
 
  vpc_id            = data.terraform_remote_state.networking.outputs.vpc_id
  private_subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
}

Phase 5: Update CI/CD (Week 11-12)

# terraform-networking/.github/workflows/deploy.yml
name: Deploy Networking
 
on:
  push:
    branches: [main]
    paths:
      - 'dev/**'
      - 'staging/**'
      - 'production/**'
 
jobs:
  deploy:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        environment: [dev, staging, production]
 
    steps:
      - uses: actions/checkout@v3
 
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
 
      - name: Terraform Apply
        working-directory: ${{ matrix.environment }}
        run: |
          terraform init
          terraform apply -auto-approve

Results After Migration

Before (Monorepo):

  • CI/CD time: 40 minutes
  • Merge conflicts: 10+ per day
  • Deploy frequency: 15 deploys/week
  • Team autonomy: Low (cross-team reviews required)

After (Polyrepo):

  • CI/CD time: 8 minutes per service
  • Merge conflicts: 1-2 per week
  • Deploy frequency: 80 deploys/week
  • Team autonomy: High (teams self-service)

Trade-offs:

  • Module management: Now requires versioning discipline
  • Cross-service changes: Require coordination (but rare)
  • Tooling consistency: Requires shared CI/CD templates

Best Practices for Both Approaches

Monorepo Best Practices

1. Use Terragrunt for DRY Configuration

# terragrunt.hcl (root)
remote_state {
  backend = "s3"
  config = {
    bucket = "terraform-state-${get_aws_account_id()}"
    key    = "${path_relative_to_include()}/terraform.tfstate"
    region = "us-east-1"
  }
}
 
# environments/production/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}
 
terraform {
  source = "../../modules//ecs-service"
}
 
inputs = {
  service_name = "api"
  environment  = "production"
  cpu          = 1024
  memory       = 2048
}

2. Environment-Specific Workspaces

# Use workspaces for environment separation within monorepo
cd environments/shared
terraform workspace new dev
terraform workspace new staging
terraform workspace new production
 
terraform workspace select production
terraform apply -var-file="production.tfvars"

3. Pre-commit Hooks for Validation

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.83.5
    hooks:
      - id: terraform_fmt
      - id: terraform_validate
      - id: terraform_tflint
      - id: terraform_tfsec

Polyrepo Best Practices

1. Private Terraform Registry

# Publish modules to Terraform Cloud/Enterprise
terraform {
  cloud {
    organization = "company"
 
    workspaces {
      name = "terraform-modules"
    }
  }
}
 
# Consume in service repos
module "ecs_service" {
  source  = "app.terraform.io/company/ecs-service/aws"
  version = "~> 1.2"
 
  service_name = "api"
}

2. Standardized Repository Template

# Create template repository: terraform-service-template
terraform-service-template/
├── .github/
│   └── workflows/
│       ├── plan.yml
│       └── apply.yml
├── modules/
├── dev/
├── staging/
├── production/
├── .pre-commit-config.yaml
├── .tflint.hcl
└── README.md

# Use as template for new services
gh repo create terraform-new-service --template terraform-service-template

3. Automated Dependency Updates

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: "terraform"
    directory: "/"
    schedule:
      interval: "weekly"
    open-pull-requests-limit: 5
    reviewers:
      - "platform-team"

Conclusion: No One-Size-Fits-All

The monorepo vs. polyrepo decision isn't binary—it's a spectrum:

Small team (2-10 engineers): Start with monorepo

  • Simple to manage
  • Easy atomic changes
  • Low coordination overhead

Growing team (10-25 engineers): Consider hybrid

  • Shared modules in separate repo
  • Services in monorepo with workspaces
  • Team-specific CI/CD paths

Large organization (25+ engineers): Move to polyrepo

  • Service ownership per team
  • Independent deployment velocity
  • Scale autonomy

The key: Match your repository structure to your team structure. Conway's Law applies to infrastructure code:

"Organizations design systems that mirror their communication structure"

If your teams are tightly coupled, monorepo works. If your teams are autonomous, polyrepo scales better.

Action Items

  1. Assess your current pain points: Long CI/CD? Merge conflicts? Unclear ownership?
  2. Map your team structure: How many teams? How do they collaborate?
  3. Measure deployment frequency: Deploys per week per team
  4. Identify service boundaries: Which infrastructure components are independent?
  5. Start small: Migrate one service to polyrepo (or consolidate two repos into monorepo)
  6. Iterate based on team feedback: Survey developers on autonomy vs. consistency

If you need help designing a Terraform repository structure for your organization, schedule a consultation. We'll analyze your team structure, deployment patterns, and provide a migration roadmap with Terraform code and CI/CD examples.

Need Help with Your Cloud Infrastructure?

Our experts are here to guide you through your cloud journey

Schedule a Free Consultation