Series: RDS Rightsizing
RDS Rightsizing: A Practical Guide
RAW MDX ↗Note: This document is a simplified, sanitized version of the actual domain blueprint used to train the RDS Rightsizing agent in our previous case study: Engineering AI Agents: Moving Beyond “Creative Statistics” to Build Pragmatic Dev Tools. The full, production-ready version lives securely in my client’s Confluence as part of their proprietary knowledge base and my personal consulting IP. Consider this a hands-on preview of how I translate engineering heuristics into structured, machine-ready logic.
This guide covers the basics of rightsizing classic RDS instances using standard CloudWatch metrics. We won’t go deep into Aurora (it has its own quirks, enough for a separate article), parameter group tuning, or exotic EC2 instance families. Think of this as a starting point — a solid foundation before you go hunting for more advanced optimizations.
Observations in this guide are based on analysis of over 700 RDS instances in production and staging environments.
Understanding Instance Naming
Before we dive into metrics, let’s decode instance names. Once you get the pattern, the AWS documentation becomes a lot less intimidating.
The format is: db.[family][generation][modifiers].[size]
The db. prefix
Just a label indicating this EC2 is managed by RDS. Nothing to worry about.
Family
The family tells you what the instance is optimized for:
t— Burstable general purpose. Cheaper thanmbut with caveats for sustained load above ~50% CPU (more on this later).m— General purpose. 4 GiB RAM per vCPU. The boring, reliable workhorse.r— Memory optimized. 8 GiB RAM per vCPU. Your go-to for databases that live in RAM.c— Compute optimized. 2 GiB RAM per vCPU. Not available for RDS, but you’ll see it in EC2.
Generation
Just a number. Higher is better — newer generations consistently deliver better performance per dollar, though occasionally the per-vCPU price may be marginally higher on the latest gen. In practice, staying on newer generations is almost always worth it.
Modifiers
The modifier sits between the generation number and the size. The most important ones — both for this guide and in general — describe the CPU architecture:
- (none) or
i— Intel. The default, most widely supported, and most expensive option. a— AMD. Roughly 10% cheaper than Intel with identical architecture. Generally a safe swap.g— Graviton (ARM). Around 20% cheaper than Intel. In practice, this works perfectly well for RDS. I’ve personally heard exactly one story about ARM performance issues — at a DB architects meetup, from someone who was squeezing every last drop out of a banking application database. Worth noting: that was back when Graviton on RDS was relatively new (6th generation), and I honestly don’t know if it would still be a problem today. For the vast majority of databases it’s a complete non-issue, and the 20% savings are hard to argue with.
There are other modifiers beyond CPU architecture — for example, d indicates a local NVMe SSD attached directly to the instance, which bypasses EBS entirely (and its associated limits). It’s fast, but ephemeral: that storage survives a reboot but not an instance migration or type change. There’s no need to memorize any of this — the AWS documentation is your friend when you encounter something unfamiliar.
Size
The size determines vCPU count. Ignoring the t family (which has micro, small, and medium sizes with 2 vCPUs but progressively less RAM — fine for very low traffic or test environments), the rest follows a clean pattern:
| Size | vCPUs | RAM (r family example) |
|---|---|---|
| large | 2 | 16 |
| xlarge | 4 | 32 |
| 2xlarge | 8 | 64 |
| 4xlarge | 16 | 128 |
| 16xlarge | 64 | 512 |
The multiplier in the name is relative to xlarge. So . RAM follows the family ratio.
Example
db.r8gd.xlarge
- Family:
r— memory optimized - Generation: 8th
- Modifiers:
g— using Graviton CPUd— instance with local NVMe SSD drive
- Size:
xlarge— 4 vCPUs and 32 GiB of memory
MultiAZ vs Read Replicas: Not the Same Thing
This gets confused constantly, so let’s clear it up before touching any metrics.
MultiAZ is for availability, not performance
A MultiAZ setup creates an exact copy of your primary instance in a different Availability Zone. It doubles your cost and runs as a hot standby — invisible to your application under normal conditions. When the primary goes down (or during a maintenance window), RDS fails over to the standby automatically. That’s it. It does nothing for read throughput.
MultiAZ uses synchronous replication — every write must be confirmed on the standby before it’s acknowledged to the application. This means a failover won’t lose data: everything committed on the primary is guaranteed to be on the standby.
The flip side is that synchronous replication adds a small latency overhead to every write — the application has to wait for the cross-AZ round trip. In practice it’s a minor penalty and a reasonable price for proper DR, but it’s worth knowing it’s there.
Read replicas are for performance, not HA
A read replica is a separate database instance that receives data via asynchronous replication. This means:
- There’s replication lag — the replica is always slightly behind the primary.
- You can route read traffic to the replica to offload the primary.
- The replica can be a different size than the primary.
- It can live in the same AZ if that’s what makes sense.
Can you promote a replica to a new primary if the original dies? Technically yes. But it’s a manual process, you’ll need to reconfigure other replicas and update DNS, and there’s a risk of losing the most recent writes due to replication lag. It’s a last resort, not a HA strategy.
Practical implications
- Non-production environments almost certainly don’t need MultiAZ. The cost is rarely justified outside of production.
- A master with MultiAZ + 2 single-AZ read replicas behind an RDS Proxy can be a better trade-off than a master with MultiAZ + 1 MultiAZ replica — you get better read scalability at comparable or lower cost. That said, RDS Proxy configuration deserves its own article; there are nuances worth understanding before committing to that setup.
- It’s also worth knowing that
RDS MultiAZ Clusterexists as a separate deployment option — it combines the benefits of read replicas and MultiAZ in a single setup, and as a bonus it solves the separate-endpoint problem out of the box. It’s often a cheaper option than managing replicas and proxy separately, but it comes with its own architectural constraints. Another topic that deserves its own write-up. - More powerful routing solutions like pgpool-II can intelligently distribute read/write traffic across replicas, but they require dedicated infrastructure and have their own operational complexity. Separate article territory.
CPU Utilization
Use CPUUtilization from standard CloudWatch metrics. Observe over at least 1 month — 2 to 3 months gives a much clearer picture, especially for workloads with weekly or monthly cycles.
The baseline is never zero
A database instance always carries some baseline CPU load just to exist. The OS, the hypervisor, background processes — none of that is free. Depending on instance size, expect 1.5%–4% as a constant floor. If your CPU utilization never climbs much above that, the database is genuinely idle.
Replication has a cost
Replication isn’t magic — it consumes CPU on both the primary and the replica. Just maintaining the replication process adds roughly 1%–3% on top of the baseline, even with no actual writes to process. With active binary logs flowing, it’s more.
This matters when you’re evaluating whether to remove a read replica. Two replicas running at 20% each don’t simply merge into one at 40% CPU when you consolidate to one. The OS and replication overhead from the removed replica doesn’t carry over — it just disappears. In practice, two replicas at 20% CPU will consolidate to around 30%–35% on a single replica. Still well within a healthy range.
💡 For Aurora, none of this applies — replication is handled at the storage layer and is independent of compute instances.
When to consider downsizing
If CPUUtilization stays consistently below 50% over your observation window with minimal spikes, you have a candidate for downsizing (or replica removal). The key word is consistently — a database that sits at 30% for weeks but spikes to 80% every Friday afternoon is a different conversation.
Spikes require judgment. Reducing CPU capacity means that when those spikes repeat — and they usually do, because most load patterns are cyclical — things will still work, just slower. Whether that’s acceptable depends entirely on context:
- A nightly data import that takes 30 minutes instead of 10? Probably fine.
- The morning login rush that suddenly adds 20% to every authentication? Probably not fine.
Burstable instances (t family): special rules
The t family is a trap if you’re not paying attention. These instances deliver full advertised performance only up to around 50% CPU utilization — above that, you’re running on burst credits.
Burst credits accumulate over time and can be spent in short bursts. As a rough guideline, about 30 minutes of burst per day fits within the standard credit budget — and importantly, this holds approximately true regardless of instance size, because both earn rate and burn rate scale proportionally.
The danger zone:
Frequent or long spikes above 50%: you’ll exhaust credits and performance will throttle dramatically.
Unlimited burst mode: available as an option, but the cost is variable and can spiral quickly. Pricing for Linux instances is $0.05 per vCPU-hour for t2/t3 and $0.04 for t4g — charged for every hour spent in burst beyond the standard accumulated credits. As a rough rule of thumb, if an instance is spending around 40%–50% of its time in burst, you’re at the breakeven point where switching to a comparable m-family instance becomes cheaper. The detailed mechanism and surplus credit calculations are covered in the AWS documentation: AWS EC2 Burstable Performance Instances — Windows instances carry a higher rate, so factor that in if relevant.
💡 If your t-family instance is regularly spending significant time above 50% CPU, the honest answer is probably to move to an m or r family instance rather than trying to manage credits.
Memory
Database engines are greedy with RAM by design. Unused memory is wasted memory, so they’ll take as much as they can get for caching. This means a healthy database will always show low free memory — and that’s expected.
FreeableMemory: what you’re actually looking at
The FreeableMemory metric shows memory that’s in use but not critical — cache and buffers that can be reclaimed if needed. The question isn’t “is FreeableMemory low” but rather “is it lower than the baseline for this engine and instance size”.
PostgreSQL
Postgres on RDS (with default parameter group settings) reserves a significant portion of RAM on a fixed basis — shared buffers, index cache, and related structures. Based on observations across over 700 instances, the approximate baselines for default configurations are:
| Instance RAM | Reserved (approx.) |
|---|---|
| 2GB | ~0.9GB |
| 4GB | ~1.7GB |
| 8GB | ~3.5GB |
| 16GB | ~7GB |
| 32GB | ~13GB |
The pattern is roughly 45%–47% of total RAM locked in place. If FreeableMemory stays anchored at this baseline over a long period, the database isn’t really using the headroom — it’s a strong signal you could drop down in size.
💡 These numbers apply to default RDS parameter group settings. If someone has tuned shared_buffers manually, all bets are off — but anyone making those changes in the parameter group already knows what they’re doing.
MySQL
MySQL shows lower reserved baselines, but the variance between configurations is significant enough that we’re not going to publish specific numbers here. The approach remains the same: watch the trend, not the absolute value. If FreeableMemory is stable and high over weeks, the instance has too much RAM.
Sizing decisions from memory metrics
FreeableMemorystable at baseline over months → strong candidate for downsizing, potentially two size steps (e.g., 32GB → 8GB). Careful with that though – it’s safer to go down by one size, leave for some time and check again.FreeableMemoryslightly variable but consistently above baseline → consider downsizing one step (50% RAM reduction).FreeableMemoryfluctuating significantly → look more carefully before touching anything.
What basic metrics can’t tell you
Cache hit ratio — the real measure of whether your database is happy with its RAM allocation — requires Enhanced Monitoring or a Prometheus exporter. Standard CloudWatch won’t give you this. What it does give you is SwapUsage.
If SwapUsage is consistently zero or near zero, the instance is properly sized. The database never had to push memory to disk.
If SwapUsage is non-trivial and growing, the engine is actively swapping pages — which is painful for performance. This is your signal that more RAM would let the database breathe. It won’t help you downsize, but it’ll stop you from making a bad situation worse.
Disk Performance
Two metrics matter here: IOPS (operations per second) and throughput (MB/s). They measure different things:
- IOPS — how many individual read/write operations the disk can handle per second.
- Throughput — how much data moves per second.
For OLTP databases, IOPS is almost always the constraint. Out of ~700 instances analyzed, around 300 showed signs of IOPS pressure. Throughput was the bottleneck in approximately 2 cases. That ratio should calibrate your attention.
Throughput matters more for analytical workloads, data warehouses, or anything involving large sequential reads or writes — streaming, bulk imports, that sort of thing.
Storage types: gp3 vs io2
For RDS, you’re choosing between two relevant EBS volume types:
gp3(General Purpose SSD) — The right choice for the vast majority of workloads. Good performance, predictable cost, and IOPS/throughput can be configured independently of volume size.io2(Provisioned IOPS SSD) — For systems that need sub-millisecond consistency at p99 latency, multi-attach, or extreme IOPS requirements. If you need io2, you’ve probably already exhausted most other optimization options and are operating in fairly specialized territory. io2 Block Express exists for the truly exotic requirements.
gp3 performance tiers
| Volume size | Baseline IOPS | Baseline throughput | Max (configurable) |
|---|---|---|---|
| <400GB | 3,000 IOPS | 125 MiB/s | Fixed |
| ≥400GB | 12,000 IOPS | 500 MiB/s | 64,000 IOPS / 4,000 MiB/s |
For volumes ≥400GB, IOPS and throughput can be scaled independently of each other and independently of volume size. This is useful — you don’t have to overprovision storage to get the performance you need.
This is the most significant improvement over the older gp2 volumes, where performance scaled rigidly with storage size: 3 IOPS per GB with a minimum of 100 IOPS and a maximum of 16,000 IOPS. On gp2, if you needed more IOPS you had to buy more disk — even if you didn’t need the space. gp3 decouples all of that. The baseline performance is also substantially higher for the same price, and the ceiling is 64,000 IOPS versus gp2’s 16,000 IOPS — though in practice, you’re unlikely to get anywhere near those limits on a typical RDS workload.
💡 Keep in mind that for volumes under 400GB, performance scaling is completely locked. If 3,000 IOPS is not enough, your only option is to bump the storage size. This isn’t just an arbitrary AWS pricing restriction; it stems from the underlying EBS infrastructure. Only when you hit the 400GB threshold does AWS start its “under-the-hood magic,” combining multiple physical SSD volumes into a single logical entity capable of scaling independently up to 64k IOPS.
The instance bottleneck trap
Here’s a mistake that shows up regularly: a team sees high IOPS, decides to provision more IOPS on the EBS volume, pays more, and sees no improvement. The reason is usually that the bottleneck isn’t the disk — it’s the network pipe between the EC2 instance and EBS.
Every EC2 instance has a maximum EBS bandwidth limit that’s independent of the volume’s capabilities. If you’re hitting the instance ceiling, upgrading the volume does nothing.
This is especially relevant for 2xlarge and smaller instances, where the limits are reached surprisingly quickly. Before scaling EBS configuration, check the instance limits:
Amazon EC2 Instance Types - EBS Optimized
Burst on smaller instances
Instances below a certain size have a baseline EBS bandwidth and a maximum (burst) bandwidth. The burst level is roughly equivalent to what a 4xlarge can sustain, but it’s available for only about 30 minutes per day via accumulated credits. Outside that window, you’re limited to the baseline.
This is separate from gp3 volume performance — the volume doesn’t burst. The instance network-to-EBS connection does.
💡 Newer instance generations typically bring improvements across the board — IOPS limits, network throughput, and EBS bandwidth all tend to increase. This means a generation upgrade can sometimes resolve a bottleneck without increasing instance size — and a generation upgrade is always cheaper than going up a size.
💡 For the 8th generation specifically: all sizes from large upwards have a baseline EBS bandwidth exceeding gp3’s 3,000 IOPS — but you need to reach 2xlarge before the instance can fully utilize a ≥400GB gp3 volume’s 12,000 IOPS baseline. Worth keeping in mind when matching instance size to storage configuration.
Using latency metrics
If TotalIOPS is well below the instance and volume limits but ReadLatency or WriteLatency is consistently high — that’s the scenario where io2’s sub-millisecond consistency starts to make sense. The problem isn’t volume, it’s predictability.
Conversely, high latency combined with high TotalIOPS points to a capacity problem: you’re saturating the instance pipe, the volume, or both. io2 won’t help there.
Honest note: this specific pattern (high latency, low IOPS) is rare in practice. We haven’t seen enough cases to publish a concrete latency threshold. Treat it as a diagnostic direction rather than a hard rule.
What to watch
- Start with
TotalIOPS— no need to split read/write initially. - If
TotalIOPSis consistently high, check instance limits before touching EBS configuration. - Check
ReadLatency/WriteLatencyas a secondary signal, particularly if IOPS looks fine but queries are slow. - Throughput metrics are worth a glance for completeness, but don’t expect them to be the bottleneck.
Database Connections
This one is almost too simple to mention — and yet it’s surprisingly easy to overlook, especially with read replicas. Check the DatabaseConnections metric.
If an instance has registered zero connections — or suspiciously few — over a month-long observation window, it’s an obvious signal that the instance is unused. In most cases that means it’s unnecessary and can be removed. Occasionally it means there’s a misconfiguration somewhere in the services pointing at it, which is also worth knowing.
This matters especially for read replicas. Unlike Aurora (which provides a built-in reader endpoint), standard RDS gives each read replica its own address. Without an RDS Proxy or similar in front of them, traffic distribution is entirely up to whoever configured the application. If one replica is idle while others are struggling, that’s a routing problem worth investigating before drawing any conclusions about whether you need more or fewer replicas.
The most interesting — and surprisingly common — scenario is a master with one read replica where the replica receives zero connections while the master is working flat out. This almost always means the application has a single endpoint configured for both reads and writes. You can confirm this by checking ReadIOPS and ReadThroughput on the master: some read activity is expected (replication itself requires reads), but if read metrics are clearly and consistently higher than their write counterparts, the master is doing everything.
In that situation, there are two reasonable paths:
- Remove the replica. It’s doing nothing except costing money.
- Fix the root cause: route reads to the replica as originally intended, let things run for a while, then revisit the master’s metrics. There’s a reasonable chance the master, now handling only writes, becomes a candidate for downsizing too.
Network Throughput
Standard CloudWatch provides NetworkReceiveThroughput and NetworkTransmitThroughput. For most RDS setups, client-to-database network traffic is lighter than database-to-disk traffic, so network is rarely the bottleneck.
That said, it can become one — particularly in cases with high read replica traffic, large result sets, or data-intensive integrations. If you’re seeing unexplained performance issues and IOPS/CPU/memory all look healthy, it’s worth checking whether the instance’s network interface is saturated. Instance network limits are in the same documentation table as EBS limits.
Where to Go From Here
This guide covers the fundamentals: the metrics available in standard CloudWatch, what they mean, and how to use them to make reasonable sizing decisions without over-engineering the analysis.
What this guide deliberately skips:
- Aurora-specific architecture and its different scaling model
- Cache hit ratio analysis (requires Enhanced Monitoring or Prometheus)
- Parameter group tuning and its impact on memory behavior
- Advanced proxy and routing solutions (RDS Proxy deep-dive, pgpool-II)
- Unlimited burst cost calculations for t-family instances (beyond the breakeven overview above)
- Exotic EC2 instance modifiers and their niche use cases
Most of those topics deserve their own write-up. The metrics covered here are enough to identify the obvious wins — and in a fleet of any meaningful size, there are usually quite a few of those.