Fundamentals Latency vs Throughput
Latency vs Throughput
Fast single requests and high volume are related—but optimizing one can hurt the other.
Latency
Time for a single request to complete (p50/p95/p99). Users feel tail latency most.
Throughput
Work completed per unit time (requests/sec, bytes/sec). Often limited by CPU, IO, or contention.
Queueing
As utilization approaches 100%, queues build and latency spikes. Keep headroom for bursts.
Backpressure
Push back on producers (shed load, rate limit, bounded queues) to protect downstream systems.
Practical Tips
- Track p50/p95/p99; p99 drives user experience and incident pain.
- Bound queues and add timeouts; unbounded retries create latency collapse.
- Use load shedding when overloaded to keep the system responsive for some users.
- Prefer idempotent operations so retries don’t amplify failures.