When to Fire Your MSP and Build an In-House DevOps Team
Key takeaways
- MSPs make sense for seed-stage startups (under $1M ARR, fewer than 10 engineers) but become cost-inefficient and velocity bottlenecks beyond Series A
- Financial break-even occurs at $150K-$200K annual MSP spend when a single senior DevOps engineer ($180K loaded cost) provides more value and faster iteration
- Warning signs to transition include multi-day ticket response times, lack of IaC adoption, vendor lock-in to MSP-specific tools, and inability to customize infrastructure
- True MSP cost includes hidden expenses like engineering context switching (20% productivity loss), delayed feature launches, technical debt accumulation, and opportunity cost
- Successful transitions require 6-12 month phased approaches: hire first DevOps engineer, document existing infrastructure, implement IaC, gradually reduce MSP scope, and maintain knowledge transfer overlap
The MSP Paradox
Your startup just raised a Series A. You have 25 engineers shipping features daily. But every infrastructure change requires a ticket to your Managed Service Provider:
Monday, 9 AM:
Ticket #4521: Please increase RDS instance from db.t3.large to db.r5.xlarge
Priority: High
Estimated response time: 24-48 hours
Wednesday, 3 PM:
MSP Response: "We can schedule this change for Saturday at 2 AM EST.
Downtime estimate: 15-20 minutes. Please confirm."
Your engineering team: Blocked for 4 days waiting for a 30-second Terraform change.
The cost:
- Feature launch delayed 1 week
- 3 engineers context-switched to other work
- $50K revenue opportunity missed
Meanwhile, your competitor with an in-house DevOps team: Made the change in 10 minutes during business hours with zero downtime using Multi-AZ failover.
The Financial Break-Even Point
MSP Cost Structure (Real Numbers)
Typical MSP pricing for Series A startup:
Base retainer: $5,000/month
- Includes: 40 hours support, basic monitoring, security patching
Per-hour overage: $200/hour
- Average usage: 20 hours/month
- Monthly overage: $4,000
AWS markup: 10-15%
- Your AWS spend: $15,000/month
- MSP markup: $2,250/month
Total MSP cost: $11,250/month = $135,000/year
At scale (Series B, 50 engineers):
Base retainer: $10,000/month
Per-hour overage: $200/hour (50 hours/month) = $10,000
AWS markup (15% of $50K): $7,500/month
Total MSP cost: $27,500/month = $330,000/year
In-House DevOps Team Cost Structure
Scenario 1: Single Senior DevOps Engineer
Salary: $150,000/year
Benefits (30%): $45,000
Tools/SaaS: $12,000 (Datadog, PagerDuty, GitHub Actions, Terraform Cloud)
Recruitment: $10,000 (amortized over 3 years)
Total loaded cost: $180,000/year
Break-even vs MSP: $135K-$180K annual MSP spend
Scenario 2: Two-Person Team (Series B+)
Senior DevOps Engineer: $180,000 loaded
Mid-level Platform Engineer: $140,000 loaded
Tools/SaaS: $18,000
Total: $338,000/year
Break-even vs MSP: $300K+ annual MSP spend
The Hidden Costs of MSPs
The sticker price doesn't include:
1. Engineering Productivity Loss
Average ticket response time: 24 hours
Number of infrastructure requests/month: 40
Total blocking time: 40 days/month across team
Engineering team size: 20
Percentage of time blocked: 10%
Loaded cost per engineer: $180K/year
Annual productivity loss: 20 × $180K × 10% = $360,000/year
2. Delayed Feature Launches
Features delayed by infrastructure: 3/quarter
Average revenue per feature: $50K/quarter
Annual opportunity cost: 12 × $50K = $600,000/year
3. Technical Debt Accumulation
MSP uses ClickOps (no IaC)
Manual configuration = undocumented changes
Debt cleanup: 6 months × $180K = $90,000
4. Vendor Lock-In
MSP-specific monitoring tools
Proprietary deployment scripts
Migration cost to standard tooling: $50,000
Total Cost of Ownership (TCO) Comparison
MSP Total Cost (Series A, 25 engineers):
Direct MSP fees: $135,000
Engineering productivity loss: $200,000
Delayed features (3/year): $150,000
Technical debt: $30,000/year (amortized)
Total TCO: $515,000/year
In-House Total Cost (1 DevOps engineer):
Direct cost: $180,000
Productivity loss: $0 (same-day changes)
Delayed features: $0
Technical debt: -$50,000 (IaC implemented)
Total TCO: $130,000/year
Net savings: $385,000/year
Warning Signs You've Outgrown Your MSP
Sign 1: Ticket-Driven Velocity Bottleneck
Red flag scenario:
Your team: "We need to add a new Lambda function for the new API endpoint."
MSP: "Please submit a ticket with the following:
- Function name, runtime, memory allocation
- IAM role requirements
- VPC configuration
- Deployment package (upload to S3)
- Estimated traffic volume
Response time: 2-3 business days"
Your team: "...it's a 50-line function that would take 5 minutes to deploy."
What this looks like at scale:
- 40+ infrastructure tickets per month
- Average resolution time: 48 hours
- Engineers blocked 20% of sprint capacity
- Features shipped 2-3 weeks late
If you can't terraform apply without a ticket, you've outgrown your MSP.
Sign 2: No Infrastructure as Code
MSP approach:
# MSP's "infrastructure documentation"
RDS_INSTANCE_ID=prod-db-1
INSTANCE_CLASS=db.r5.xlarge
STORAGE=500GB
BACKUP_RETENTION=7 days
# Changed manually via AWS Console on 2024-11-15 by MSP Engineer
In-house approach:
# infrastructure/rds.tf
resource "aws_db_instance" "production" {
identifier = "prod-db-1"
instance_class = "db.r5.xlarge"
allocated_storage = 500
backup_retention_period = 7
# Git history shows who changed what and why
}Why this matters:
- MSP changes are invisible (no audit trail)
- Disaster recovery requires "tribal knowledge"
- Compliance audits are nightmares
- Scaling horizontally requires manual replication
If your infrastructure isn't in Git, you don't control it.
Sign 3: AWS Markup Exceeds Tooling Cost
The math:
Your AWS spend: $50,000/month
MSP markup: 15% = $7,500/month = $90,000/year
Alternative tooling cost:
- Datadog: $2,000/month
- PagerDuty: $500/month
- Terraform Cloud: $1,000/month
- GitHub Actions: $500/month
Total: $4,000/month = $48,000/year
Savings by eliminating markup: $42,000/year
Paying $90K/year for AWS console access is indefensible.
Sign 4: Innovation Theater
MSP conversations sound like:
You: "Can we implement blue-green deployments for zero-downtime releases?"
MSP: "We can explore that in Q3 2025 as part of our roadmap."
You: "Can we use Spot Instances to save 70% on compute costs?"
MSP: "We recommend against that due to stability concerns."
You: "Can we migrate to Graviton instances for 40% performance improvement?"
MSP: "That's not currently supported in our standard offering."
Translation: "We don't want to change our template."
Your competitor's in-house team: Shipped all three in the same quarter.
Sign 5: You're Hiring DevOps Engineers Anyway
The trap:
Your job posting: "Senior DevOps Engineer
- Build CI/CD pipelines
- Optimize cloud costs
- Implement monitoring and alerting
- Work with MSP to provision infrastructure"
Wait... why are you paying both?
If you're hiring DevOps engineers to work around MSP limitations, fire the MSP.
The Transition Plan: 6-Month Roadmap
Month 1-2: Hire Your First DevOps Engineer
The ideal first hire profile:
- Senior level (5+ years experience)
- AWS Certified Solutions Architect
- Terraform expert
- Experience with both MSP and in-house environments
- Strong documentation skills (will be knowledge bridge)
Compensation benchmark:
Seed stage: $130K-$150K + 0.5-1% equity
Series A: $150K-$180K + 0.25-0.5% equity
Series B: $170K-$200K + 0.1-0.25% equity
Interview question to ask:
"Walk me through how you'd transition infrastructure from an MSP to in-house over 6 months with zero downtime. What would you do first?"
Good answer: "First, I'd audit existing infrastructure and create Terraform for net-new resources. Then gradually import existing resources into Terraform state while maintaining MSP as fallback."
Bad answer: "I'd immediately migrate everything to Kubernetes on day one."
Month 2-3: Infrastructure Audit and Documentation
Your new DevOps engineer's first 60 days:
Week 1-2: Discovery
# What exists?
aws ec2 describe-instances
aws rds describe-db-instances
aws ecs list-clusters
aws lambda list-functions
aws s3 ls
# Document everything in a spreadsheet:
# Resource Type | ID | Purpose | Owner | Dependencies | Cost/MonthWeek 3-4: Terraform Pilot
# Start with low-risk, new resources
# Example: New development environment
module "dev_vpc" {
source = "terraform-aws-modules/vpc/aws"
name = "dev-vpc"
cidr = "10.1.0.0/16" # Use a CIDR calculator to plan your network
azs = ["us-east-1a", "us-east-1b"]
private_subnets = ["10.1.1.0/24", "10.1.2.0/24"]
public_subnets = ["10.1.101.0/24", "10.1.102.0/24"]
}
# Don't touch production yet!Week 5-8: Import Existing Resources
# Gradually import MSP-managed resources
terraform import aws_security_group.api sg-abc123
terraform import aws_db_instance.production prod-db-1
# Verify state matches reality
terraform plan # Should show "No changes"Month 3-4: Parallel Infrastructure Management
Run both MSP and in-house in parallel:
New Resources (Month 3-4):
- All new services → In-house Terraform
- All new environments → In-house Terraform
- Net-new infrastructure → In-house Terraform
Existing Production Resources:
- RDS databases → Still MSP-managed
- ECS clusters → Still MSP-managed
- VPC/Networking → Still MSP-managed
Emergency Changes:
- MSP remains on-call for production incidents
- In-house engineer shadows all changes
Deliverables:
- 100% of new infrastructure deployed via Terraform
- Terraform state for 30% of existing resources
- Runbooks documenting all operational procedures
- Monitoring dashboards replicated in Datadog/CloudWatch
Month 4-5: Production Cutover Planning
Test disaster recovery with MSP as safety net:
Week 1: Test RDS failover
- In-house team: Promotes read replica using Terraform
- MSP: On standby in case of issues
- Result: Success, 3-minute failover
Week 2: Test ECS deployment
- In-house team: Blue-green deployment via Terraform + CodeDeploy
- MSP: Monitoring but not touching
- Result: Success, zero downtime
Week 3: Test full disaster recovery
- In-house team: Rebuild staging environment from Terraform
- MSP: Validates configurations
- Result: Success, 2-hour recovery time
Milestone: Confidence that in-house team can handle production operations
Month 5-6: Renegotiate MSP Contract
Option 1: Gradual reduction
Month 5: Reduce retainer by 50%
- In-house handles all new infrastructure
- MSP provides advisory services only
- Cost reduction: $5,000/month
Month 6: Reduce to on-call support only
- In-house handles 95% of operations
- MSP available for emergency escalations
- Cost reduction: $8,000/month
Month 7: Full termination
- In-house team fully operational
- MSP contract ends
- Savings: $11,250/month
Option 2: Convert to consulting retainer
End managed services: Save $11,250/month
Retain MSP as consultants: $3,000/month (20 hours)
- Architecture reviews
- Security audits
- Capacity planning advice
Net savings: $8,250/month = $99,000/year
Month 6+: Knowledge Transfer and Documentation
Final deliverables before MSP exit:
Infrastructure Documentation:
terraform-infrastructure/
├── docs/
│ ├── architecture.md
│ ├── runbooks/
│ │ ├── rds-failover.md
│ │ ├── ecs-deployment.md
│ │ ├── incident-response.md
│ │ └── disaster-recovery.md
│ └── cost-optimization.md
├── modules/
├── environments/
└── README.md
Operational Procedures:
- On-call rotation schedule
- Incident response playbooks
- Deployment checklists
- Cost optimization strategies
- Security patching procedures
Monitoring and Alerting:
- CloudWatch dashboards
- PagerDuty escalation policies
- Slack integrations
- Weekly infrastructure health reports
Real-World Case Study: Series A SaaS Company
Company: B2B SaaS, $3M ARR, 30 engineers
Before (18 months with MSP):
MSP cost: $18,000/month = $216,000/year
AWS spend: $30,000/month (includes 15% MSP markup)
Infrastructure velocity: 3-day average turnaround
Deployment frequency: 2× per week
On-call incidents: 12/month (MSP handles)
Transition (6 months):
Month 1-2: Hired Senior DevOps Engineer ($165K)
Month 2-4: Terraform implementation (40% coverage)
Month 4-6: Parallel operation (in-house + MSP)
Month 6: MSP contract reduced to advisory ($5K/month)
After (6 months post-transition):
In-house cost: $165,000/year (1 DevOps engineer)
AWS spend: $26,000/month (removed 15% markup)
Infrastructure velocity: Same-day turnaround
Deployment frequency: 10× per week
On-call incidents: 8/month (in-house handles)
Total savings: $216K - $165K = $51,000/year
Hidden savings:
- Engineering productivity: +15% = $300K/year value
- Faster feature shipping: 3 extra features/quarter = $200K/year
Total value: $551,000/year
Unexpected benefits:
- GitOps workflow improved developer experience
- Terraform enabled environment parity (dev = staging = prod)
- Engineers self-service infrastructure via Pull Requests
- Cost optimization (Spot Instances, Graviton) saved $8K/month
Challenges:
- On-call burden (mitigated with PagerDuty + runbooks)
- Knowledge gaps (filled with AWS training)
- First production incident was stressful (but resolved faster than with MSP)
When MSPs Still Make Sense
Don't fire your MSP if:
1. Pre-Product/Market Fit (Under $1M ARR)
You should be focused on customers, not infrastructure:
Your runway: 12 months
Engineering team: 3 developers
AWS spend: $2,000/month
MSP cost: $3,000/month
In-house DevOps engineer: $15,000/month (loaded cost)
Verdict: MSP makes sense
Use MSP to get to Series A, then reassess.
2. Highly Regulated Industries (Healthcare, Finance)
Compliance expertise has value:
HIPAA/SOC 2/PCI-DSS requirements
MSP provides:
- Pre-configured compliant infrastructure
- Audit documentation
- Security monitoring
- Compliance training
Cost of non-compliance: $millions in fines
Cost of MSP: $200K/year
Verdict: MSP might still make sense
But verify MSP uses IaC, or you'll face technical debt later.
3. Expertise Gaps in Specific Technologies
Example: Legacy Windows infrastructure
Your stack: .NET Framework on Windows Server
Your team: Modern cloud-native engineers
MSP expertise:
- Active Directory management
- Windows Server patching
- IIS configuration
- MSSQL administration
Verdict: MSP makes sense temporarily while you modernize
Set a 12-month deadline to migrate to cloud-native.
Hybrid Model: Best of Both Worlds
Pattern: In-house for application infrastructure, MSP for specialized services
In-House Team Handles:
- ECS/Kubernetes clusters
- Lambda functions
- API Gateway
- DynamoDB/RDS
- CI/CD pipelines
- Application monitoring
MSP Handles:
- Network security (WAF, Shield, Firewall)
- Compliance audits (SOC 2, HIPAA)
- Advanced monitoring (Threat detection)
- Disaster recovery testing
Cost: $10,000/month MSP + $165K/year in-house = $285K/year
This works if:
- MSP provides true specialized expertise
- Clear ownership boundaries
- MSP uses IaC (Terraform) that you control
- No emergency situations require "calling the MSP"
The "One DevOps Engineer" Risk
The concern: "What if they quit?"
Mitigation strategies:
1. Documentation-First Culture
Every change includes:
- Terraform code (reviewable, reversible)
- Pull request description (why, not just what)
- Runbook updates
- Architecture decision records (ADRs)
2. Managed Services for Operational Overhead
Use AWS Managed Services:
- RDS (not self-managed Postgres)
- ECS Fargate (not EC2-based ECS)
- Lambda (not long-running servers)
- Managed Prometheus (not self-hosted)
Reduces operational burden by 60%
3. Fractional DevOps Consultant (Backup Plan)
Retainer: $5,000/month (10 hours)
- Monthly architecture review
- Quarterly disaster recovery test
- Emergency escalation contact
Acts as "insurance policy" if engineer leaves
4. Clear Career Path (Retention)
Year 1: Senior DevOps Engineer → Build foundation
Year 2: Staff Engineer → Mentor new hire
Year 3: Engineering Manager → Build team
Compensation growth:
Year 1: $165K
Year 2: $185K + promotion to Staff
Year 3: $210K + team of 3
Reality check: An engineer who built your infrastructure is more likely to stay than an external MSP that manages 50 clients.
Conclusion: The Tipping Point is Series A
The data shows:
- MSPs make sense for seed-stage startups (under $1M ARR, fewer than 10 engineers)
- Break-even point is $150K-$200K annual MSP spend
- Velocity bottlenecks emerge at 15-20 engineers
- In-house teams pay for themselves via productivity gains
The transition trigger:
If (AWS_spend > $15K/month) AND (engineering_team > 15) AND (deploy_frequency > 5/week):
start_transition_plan()
Don't wait until:
- You've hired 2-3 DevOps engineers working around the MSP
- Technical debt is so high migration takes 12+ months
- Your MSP contract auto-renews for another year
Start the conversation now.
Action Items
- Calculate your true MSP TCO: Direct fees + productivity loss + opportunity cost
- Audit current MSP: Do they use IaC? What's average ticket response time?
- Define success metrics: Deployment frequency, incident response time, engineer satisfaction
- Interview DevOps candidates: Even if not ready to hire, gauge market
- Negotiate MSP contract: Add IaC requirements, reduce AWS markup, shorten auto-renewal
- Start Terraform pilot: Deploy one new environment in-house to test feasibility
If you're considering transitioning from an MSP to in-house infrastructure, schedule a consultation. We'll audit your current setup, calculate TCO, and design a risk-free transition plan that maintains operational stability while building internal capabilities.