Cost Optimization

How to Cut Your AWS RDS Bill by 40% Without Downtime

Updated By Zak Kann

Key takeaways

  • Right-sizing instances based on CloudWatch metrics (CPU, memory, IOPS) can reduce costs 20-30% by identifying over-provisioned databases running at under 40% utilization
  • Graviton 2/3 instances provide 40% better price-performance than Intel instances with simple in-place upgrades and zero application changes
  • Reserved Instances with 1-year partial upfront commitments save 35-40% for predictable production workloads, with Savings Plans offering more flexibility
  • Storage optimization through gp3 migration, IOPS right-sizing, and automated snapshot lifecycle policies saves 15-25% on storage costs
  • Aurora Serverless v2 eliminates costs for idle non-production databases that traditionally run 24/7 at 5-10% utilization, saving 70%+ for dev/staging

The RDS Cost Explosion

Your startup launched 2 years ago with a single db.t3.medium RDS instance costing $60/month. Fast forward to today:

Current RDS Bill Breakdown:

Production DB (db.r6i.2xlarge):        $1,008/month
Staging DB (db.r6i.xlarge):              $504/month
Dev DB (db.r6i.large):                   $252/month
Read Replicas (2 × db.r6i.xlarge):     $1,008/month
Backups (500GB automated + manual):      $250/month
Data Transfer:                           $180/month
────────────────────────────────────────────────────
Total:                                 $3,202/month
Annual:                               $38,424/year

Your CFO just flagged this as the #2 infrastructure cost after compute. Your task: Cut RDS costs by 40% without downtime or performance degradation.

Target: $1,921/month ($23,052/year) = $15,372 annual savings

Strategy 1: Right-Size Your Instances (20-30% Savings)

The Over-Provisioning Problem

Most RDS instances are dramatically over-provisioned:

Common pattern:
- Provisioned: db.r6i.2xlarge (8 vCPU, 64GB RAM)
- Actual usage: 25% CPU, 18GB RAM (28% memory)
- Cost: $1,008/month
- Right-sized: db.r6i.xlarge (4 vCPU, 32GB RAM)
- New cost: $504/month
- Savings: $504/month (50%)

How to Right-Size: Data-Driven Approach

Step 1: Analyze CloudWatch Metrics (30 days)

# Check CPU utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=production-db \
  --start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Average,Maximum
 
# Output:
# P50: 22% CPU
# P95: 38% CPU
# P99: 52% CPU
# Max: 68% CPU

CloudWatch query for comprehensive analysis:

SELECT
  AVG(CPUUtilization) as avg_cpu,
  MAX(CPUUtilization) as max_cpu,
  AVG(FreeableMemory) as avg_free_memory,
  AVG(ReadIOPS) as avg_read_iops,
  AVG(WriteIOPS) as avg_write_iops
FROM SCHEMA("AWS/RDS", DBInstanceIdentifier)
WHERE DBInstanceIdentifier = 'production-db'
GROUP BY DBInstanceIdentifier

Step 2: Calculate Required Resources

import boto3
from datetime import datetime, timedelta
 
cloudwatch = boto3.client('cloudwatch')
 
def analyze_rds_sizing(instance_id, days=30):
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=days)
 
    metrics = {
        'CPUUtilization': {'unit': 'Percent'},
        'FreeableMemory': {'unit': 'Bytes'},
        'ReadIOPS': {'unit': 'Count/Second'},
        'WriteIOPS': {'unit': 'Count/Second'}
    }
 
    results = {}
 
    for metric_name, config in metrics.items():
        response = cloudwatch.get_metric_statistics(
            Namespace='AWS/RDS',
            MetricName=metric_name,
            Dimensions=[{'Name': 'DBInstanceIdentifier', 'Value': instance_id}],
            StartTime=start_time,
            EndTime=end_time,
            Period=3600,
            Statistics=['Average', 'Maximum']
        )
 
        datapoints = sorted(response['Datapoints'], key=lambda x: x['Timestamp'])
 
        # Skip metrics with no data
        if not datapoints:
            print(f"Warning: No datapoints for {metric_name}, skipping")
            continue
 
        if metric_name == 'FreeableMemory':
            # Get instance memory size
            rds = boto3.client('rds')
            instance = rds.describe_db_instances(DBInstanceIdentifier=instance_id)
            instance_class = instance['DBInstances'][0]['DBInstanceClass']
 
            # Memory mapping (GB)
            memory_map = {
                'db.r6i.large': 16,
                'db.r6i.xlarge': 32,
                'db.r6i.2xlarge': 64,
                'db.r6i.4xlarge': 128
            }
 
            total_memory_gb = memory_map.get(instance_class, 0)
            if total_memory_gb == 0:
                print(f"Warning: Unknown instance class {instance_class}, skipping memory calc")
                continue
 
            avg_free_gb = sum(d['Average'] for d in datapoints) / len(datapoints) / (1024**3)
            avg_used_gb = total_memory_gb - avg_free_gb
            avg_used_pct = (avg_used_gb / total_memory_gb) * 100
 
            results['memory_used_pct'] = avg_used_pct
        else:
            avg = sum(d['Average'] for d in datapoints) / len(datapoints)
            max_val = max(d['Maximum'] for d in datapoints)
            results[f'{metric_name}_avg'] = avg
            results[f'{metric_name}_max'] = max_val
 
    return results
 
# Analyze
stats = analyze_rds_sizing('production-db')
 
print(f"CPU Average: {stats['CPUUtilization_avg']:.1f}%")
print(f"CPU Max: {stats['CPUUtilization_max']:.1f}%")
print(f"Memory Used: {stats['memory_used_pct']:.1f}%")
print(f"IOPS Average: {stats['ReadIOPS_avg'] + stats['WriteIOPS_avg']:.0f}")
 
# Recommendation logic
if stats['CPUUtilization_avg'] < 40 and stats['memory_used_pct'] < 50:
    print("✅ Recommendation: Downsize by 1 instance class")
elif stats['CPUUtilization_max'] > 80 or stats['memory_used_pct'] > 85:
    print("⚠️ Recommendation: Keep current size or upsize")
else:
    print("✓ Recommendation: Current size is appropriate")

Step 3: Perform In-Place Modification (Zero Downtime)

# Terraform configuration with zero-downtime modification
resource "aws_db_instance" "production" {
  identifier     = "production-db"
  instance_class = "db.r6i.xlarge"  # Changed from db.r6i.2xlarge
 
  # Enable Multi-AZ for zero-downtime modifications
  multi_az = true
 
  # Apply immediately during maintenance window
  apply_immediately = false
 
  # Preferred maintenance window (Sunday 3-4 AM)
  maintenance_window = "sun:03:00-sun:04:00"
}

What happens during modification:

  1. Standby replica is modified first
  2. Automatic failover to standby (30-60 second connection interruption)
  3. Primary becomes standby and is modified
  4. Total user-visible downtime: <2 minutes

Result:

Before: db.r6i.2xlarge = $1,008/month
After:  db.r6i.xlarge  = $504/month
Savings: $504/month (50%)

Strategy 2: Migrate to Graviton Instances (35-40% Savings)

The Graviton Advantage

AWS Graviton 2/3 processors (ARM-based) provide:

  • 40% better price-performance than Intel x86
  • Same RAM and storage configurations
  • Compatible with PostgreSQL, MySQL, MariaDB
  • Zero application code changes required

Price Comparison (us-east-1):

db.r6i.2xlarge (Intel):    $1.008/hour = $735/month
db.r6g.2xlarge (Graviton): $0.806/hour = $588/month
Savings:                    $147/month (20%)

db.r6i.xlarge (Intel):     $0.504/hour = $368/month
db.r6g.xlarge (Graviton):  $0.403/hour = $294/month
Savings:                    $74/month (20%)

Migration Process (15-Minute Downtime)

Method 1: Blue-Green Deployment (Zero Downtime)

# Step 1: Create read replica on Graviton
resource "aws_db_instance" "graviton_replica" {
  identifier             = "production-db-graviton"
  replicate_source_db    = aws_db_instance.production.identifier
  instance_class         = "db.r6g.2xlarge"  # Graviton
  publicly_accessible    = false
  skip_final_snapshot    = true
 
  # Use same configuration as primary
  multi_az                = true
  backup_retention_period = 7
}
 
# Step 2: Promote replica to primary (via AWS Console or CLI)
# aws rds promote-read-replica --db-instance-identifier production-db-graviton
 
# Step 3: Update application connection string
# OLD: production-db.abc123.us-east-1.rds.amazonaws.com
# NEW: production-db-graviton.xyz789.us-east-1.rds.amazonaws.com
 
# Step 4: Delete old primary after validation
resource "aws_db_instance" "production" {
  # ... delete this resource
}

Cutover process:

# 1. Stop application writes (maintenance mode)
# 2. Wait for replica lag to reach zero
aws rds describe-db-instances \
  --db-instance-identifier production-db-graviton \
  --query 'DBInstances[0].StatusInfos[?StatusType==`read replication`].Status'
 
# 3. Promote replica
aws rds promote-read-replica \
  --db-instance-identifier production-db-graviton
 
# 4. Update DNS CNAME or application config
# 5. Resume application writes
 
# Total downtime: 0-5 minutes (DNS propagation)

Method 2: Restore from Snapshot (15-Minute Downtime)

# 1. Create snapshot of current database
aws rds create-db-snapshot \
  --db-instance-identifier production-db \
  --db-snapshot-identifier production-db-pre-graviton
 
# 2. Restore snapshot to Graviton instance
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier production-db-new \
  --db-snapshot-identifier production-db-pre-graviton \
  --db-instance-class db.r6g.2xlarge \
  --multi-az
 
# 3. Update application connection string
# 4. Delete old instance
 
# Total downtime: 10-15 minutes (restore + DNS)

Performance Validation:

-- Before migration (Intel)
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- Planning Time: 0.125 ms
-- Execution Time: 2.341 ms
 
-- After migration (Graviton)
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- Planning Time: 0.098 ms
-- Execution Time: 1.867 ms (20% faster)

Strategy 3: Reserved Instances (35-40% Savings)

The Math

On-Demand Pricing:

db.r6g.2xlarge: $0.806/hour
Annual cost: $0.806 × 24 × 365 = $7,061

Reserved Instance (1-Year Partial Upfront):

Upfront payment: $2,400
Monthly charge: $0.414/hour
Annual cost: $2,400 + ($0.414 × 24 × 365) = $5,827
Savings: $1,234/year (17%)

Reserved Instance (1-Year All Upfront):

Upfront payment: $5,500
Monthly charge: $0
Annual cost: $5,500
Savings: $1,561/year (22%)

Reserved Instance (3-Year All Upfront):

Upfront payment: $10,800
Effective annual cost: $3,600
Savings: $3,461/year (49%)

When to Buy Reserved Instances

✅ Buy RIs for:

  • Production databases running 24/7
  • Stable instance types (no planned migrations)
  • Predictable workloads (next 1-3 years)

❌ Don't buy RIs for:

  • Development/staging (use Aurora Serverless instead)
  • Databases you plan to migrate
  • Unpredictable workloads

Terraform management:

# RIs are purchased manually via AWS Console or CLI
# Document in Terraform using tags
 
resource "aws_db_instance" "production" {
  identifier     = "production-db"
  instance_class = "db.r6g.2xlarge"
 
  tags = {
    ReservedInstance = "ri-abc123"
    RIPurchaseDate   = "2025-01-15"
    RIExpirationDate = "2026-01-15"
    RIType           = "1-year-partial-upfront"
  }
}

Savings Plans vs. Reserved Instances

Savings Plans: More flexible, applies across instance families

Commit to $500/month compute spend for 1 year
- Applies to any RDS instance (r6g, r6i, m6g, etc.)
- Automatically applies to highest discount first
- No instance-specific commitment

Flexibility: ✅✅✅
Discount: 15-20% (slightly lower than RIs)

Reserved Instances: Higher discount, less flexible

Commit to specific instance: db.r6g.2xlarge
- Only applies to exact instance type
- Must specify region and AZ
- Can sell on RI marketplace if unused

Flexibility: ✅
Discount: 35-40%

Recommendation: Start with Savings Plans for flexibility, add RIs for stable workloads. For a comprehensive guide to reducing your overall AWS bill, see our cloud cost optimization strategies.

Strategy 4: Storage Optimization (15-25% Savings)

Problem: Overprovisioned IOPS

Common mistake:

Provisioned: io2, 10,000 IOPS, 1TB = $705/month
Actual usage: 800 IOPS average, 2,000 IOPS peak

Right-sized: gp3, 3,000 IOPS, 1TB = $237/month
Savings: $468/month (66%)

Storage Type Comparison

Storage Type    | Cost/GB/mo | IOPS Included | Extra IOPS Cost
────────────────────────────────────────────────────────────────
gp2 (old)       | $0.115     | 3 IOPS/GB     | N/A (burstable)
gp3 (default)   | $0.080     | 3,000 baseline | $0.005/IOPS
io1             | $0.125     | 0              | $0.065/IOPS
io2             | $0.125     | 0              | $0.065/IOPS
Magnetic        | $0.100     | N/A            | N/A (deprecated)

Migration: gp2 → gp3 (20% Savings)

resource "aws_db_instance" "production" {
  identifier = "production-db"
 
  # Storage configuration
  storage_type          = "gp3"  # Changed from gp2
  allocated_storage     = 1000   # 1TB
  iops                  = 3000   # Customize if needed (default: 3000)
  storage_throughput    = 125    # MB/s (default: 125, max: 1000)
 
  # Apply during maintenance window
  apply_immediately = false
}

Cost impact:

Before (gp2): 1TB × $0.115 = $115/month
After (gp3):  1TB × $0.080 = $80/month
Savings: $35/month (30%)

Snapshot Lifecycle Management

Problem: Unlimited manual snapshots accumulate

# List all snapshots
aws rds describe-db-snapshots \
  --snapshot-type manual \
  --query 'DBSnapshots[?SnapshotCreateTime<`2024-01-01`]'
 
# Result: 50 manual snapshots from 2023-2024
# Cost: 50 × 50GB × $0.095 = $237/month

Solution: Automated cleanup

import boto3
from datetime import datetime, timedelta
 
rds = boto3.client('rds')
 
def cleanup_old_snapshots(retention_days=30):
    """Delete manual snapshots older than retention_days"""
 
    cutoff_date = datetime.now() - timedelta(days=retention_days)
 
    snapshots = rds.describe_db_snapshots(SnapshotType='manual')
 
    for snapshot in snapshots['DBSnapshots']:
        snapshot_time = snapshot['SnapshotCreateTime'].replace(tzinfo=None)
 
        if snapshot_time < cutoff_date:
            print(f"Deleting snapshot: {snapshot['DBSnapshotIdentifier']} ({snapshot_time})")
            rds.delete_db_snapshot(
                DBSnapshotIdentifier=snapshot['DBSnapshotIdentifier']
            )
 
# Run weekly via Lambda
cleanup_old_snapshots(retention_days=30)

Savings:

Before: 50 snapshots × 50GB × $0.095 = $237/month
After:  5 snapshots × 50GB × $0.095 = $24/month
Savings: $213/month (90%)

Strategy 5: Aurora Serverless for Non-Production (70% Savings)

The Problem with 24/7 Dev/Staging

Traditional approach:

Staging DB: db.r6g.xlarge, 24/7 = $294/month
- Actual usage: 9 AM - 6 PM weekdays = 40 hours/week
- Utilization: 40/168 = 24% (76% idle)

Aurora Serverless v2 Solution

Pricing model:

Aurora Capacity Units (ACUs):
- 1 ACU = 2GB RAM
- Cost: $0.12/ACU/hour (us-east-1)

Auto-scaling configuration:
- Min: 0.5 ACU (scales to zero when idle)
- Max: 16 ACUs (32GB RAM, for load testing)

Cost comparison:

Traditional RDS (db.r6g.xlarge, 24/7):
$294/month

Aurora Serverless v2 (40 hours/week active):
Active: 40 hours/week × 4 weeks × 8 ACUs × $0.12 = $154/month
Idle: 128 hours/week × 4 weeks × 0.5 ACUs × $0.12 = $31/month
Total: $185/month

Savings: $109/month (37%)

With scale-to-zero:
Active: $154/month
Idle: $0/month
Total: $154/month

Savings: $140/month (48%)

Terraform configuration:

resource "aws_rds_cluster" "staging" {
  cluster_identifier = "staging-db"
  engine             = "aurora-postgresql"
  engine_mode        = "provisioned"  # Serverless v2 uses provisioned mode
  engine_version     = "15.3"
 
  database_name   = "staging"
  master_username = var.db_username
  master_password = var.db_password
 
  serverlessv2_scaling_configuration {
    min_capacity = 0.5  # Scales to 1GB RAM when idle
    max_capacity = 16   # Scales to 32GB RAM under load
  }
 
  skip_final_snapshot = true
}
 
resource "aws_rds_cluster_instance" "staging" {
  cluster_identifier = aws_rds_cluster.staging.id
  instance_class     = "db.serverless"
  engine             = aws_rds_cluster.staging.engine
  engine_version     = aws_rds_cluster.staging.engine_version
}

Automated start/stop for dev environments:

import boto3
import os
 
rds = boto3.client('rds')
 
def lambda_handler(event, context):
    """
    Start dev databases at 8 AM, stop at 7 PM
    Triggered by EventBridge schedule
    """
 
    action = event.get('action')  # 'start' or 'stop'
    cluster_id = os.environ['CLUSTER_ID']
 
    if action == 'start':
        rds.start_db_cluster(DBClusterIdentifier=cluster_id)
        print(f"Started cluster: {cluster_id}")
    elif action == 'stop':
        rds.stop_db_cluster(DBClusterIdentifier=cluster_id)
        print(f"Stopped cluster: {cluster_id}")
 
    return {'statusCode': 200}

EventBridge schedule:

resource "aws_cloudwatch_event_rule" "start_dev_db" {
  name                = "start-dev-db"
  schedule_expression = "cron(0 8 ? * MON-FRI *)"  # 8 AM weekdays
}
 
resource "aws_cloudwatch_event_target" "start_dev_db" {
  rule = aws_cloudwatch_event_rule.start_dev_db.name
  arn  = aws_lambda_function.db_scheduler.arn
  input = jsonencode({
    action = "start"
  })
}
 
resource "aws_cloudwatch_event_rule" "stop_dev_db" {
  name                = "stop-dev-db"
  schedule_expression = "cron(0 19 ? * MON-FRI *)"  # 7 PM weekdays
}
 
resource "aws_cloudwatch_event_target" "stop_dev_db" {
  rule = aws_cloudwatch_event_rule.stop_dev_db.name
  arn  = aws_lambda_function.db_scheduler.arn
  input = jsonencode({
    action = "stop"
  })
}

Putting It All Together: 40% Cost Reduction Plan

Current State

Production DB (db.r6i.2xlarge, On-Demand):  $1,008/month
Staging DB (db.r6i.xlarge, On-Demand):        $504/month
Dev DB (db.r6i.large, 24/7):                  $252/month
Read Replicas (2 × db.r6i.xlarge):          $1,008/month
Backups (500GB):                              $250/month
────────────────────────────────────────────────────────
Total:                                      $3,022/month
Annual:                                    $36,264/year

Optimized State

Production DB (db.r6g.2xlarge, 1-yr RI):      $485/month (Graviton + RI)
Staging (Aurora Serverless v2, auto-scale):   $185/month (70% idle reduction)
Dev (Aurora Serverless v2, 9-6 weekdays):      $80/month (scale-to-zero)
Read Replicas (2 × db.r6g.xlarge, 1-yr RI):   $588/month (Graviton + RI)
Backups (gp3, cleanup old snapshots):          $80/month (snapshot lifecycle)
────────────────────────────────────────────────────────
Total:                                      $1,418/month
Annual:                                    $17,016/year

Savings: $19,248/year (53%)

Implementation Timeline

Week 1-2: Analysis

  • Run CloudWatch analysis scripts
  • Document current usage patterns
  • Calculate right-sized instance classes

Week 3-4: Low-Risk Changes

  • Migrate staging to Aurora Serverless v2
  • Migrate dev to Aurora Serverless v2 with auto-stop
  • Implement snapshot lifecycle management
  • Migrate storage from gp2 to gp3

Week 5-6: Production Optimization

  • Right-size production instance (if over-provisioned)
  • Create Graviton read replica
  • Purchase 1-year Reserved Instances

Week 7-8: Graviton Migration

  • Promote Graviton read replica to primary
  • Validate performance
  • Delete Intel instances

Total time: 8 weeks Total downtime: 0-5 minutes (DNS cutover)

Monitoring Cost Savings

CloudWatch Dashboard:

resource "aws_cloudwatch_dashboard" "rds_costs" {
  dashboard_name = "rds-cost-optimization"
 
  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric"
        properties = {
          metrics = [
            ["AWS/RDS", "CPUUtilization", { stat = "Average" }],
            [".", "DatabaseConnections", { stat = "Sum" }],
            [".", "FreeableMemory", { stat = "Average" }]
          ]
          period = 3600
          stat   = "Average"
          region = "us-east-1"
          title  = "RDS Performance Metrics"
        }
      },
      {
        type = "metric"
        properties = {
          metrics = [
            ["AWS/RDS", "ReadIOPS", { stat = "Average" }],
            [".", "WriteIOPS", { stat = "Average" }]
          ]
          period = 3600
          stat   = "Average"
          region = "us-east-1"
          title  = "IOPS Utilization"
        }
      }
    ]
  })
}

Cost tracking in AWS Cost Explorer:

Filters:
- Service: Amazon RDS
- Tag: Environment = Production
- Grouping: Instance Type

Compare:
- Previous month (before optimization)
- Current month (after optimization)

Conclusion: 40%+ Savings Are Achievable

The strategies in this guide are proven and low-risk:

Quick wins (Week 1-2):

  • Aurora Serverless for non-production: 15-20% savings
  • Snapshot lifecycle management: 5% savings

Medium effort (Week 3-6):

  • Right-sizing instances: 10-15% savings
  • gp3 migration: 3-5% savings

Long-term optimization (Week 7-8):

  • Graviton migration: 10-15% savings
  • Reserved Instances: 5-10% additional savings

Total: 43-65% savings depending on starting point

The key is to start small, measure everything, and make changes during maintenance windows. Use proper tagging strategies to track savings by environment and team.

Action Items

  1. Run CloudWatch analysis: Identify over-provisioned instances (30-day metrics)
  2. Calculate current costs: Use AWS Cost Explorer with RDS filter
  3. Create test plan: Staging environment first, then production
  4. Migrate to gp3: Low-risk, immediate 20% storage savings
  5. Implement Aurora Serverless: For all non-production databases
  6. Purchase RIs: After confirming stable instance types post-optimization

If you need help optimizing your RDS costs, schedule a consultation. We'll audit your current RDS configuration, provide a customized optimization plan with projected savings, and guide implementation with zero downtime.

Need Help with Your Cloud Infrastructure?

Our experts are here to guide you through your cloud journey

Schedule a Free Consultation