Fundamentals Latency vs Throughput

Latency vs Throughput

Fast single requests and high volume are related—but optimizing one can hurt the other.

Latency

Time for a single request to complete (p50/p95/p99). Users feel tail latency most.

Work completed per unit time (requests/sec, bytes/sec). Often limited by CPU, IO, or contention.

As utilization approaches 100%, queues build and latency spikes. Keep headroom for bursts.

Push back on producers (shed load, rate limit, bounded queues) to protect downstream systems.