Real-Time Systems with WebSockets
At Vercel, I built the real-time deployment monitoring system. It needed to push updates to thousands of concurrent clients with sub-second latency. WebSockets were the obvious choice — but scaling them is harder than it looks.
Connection management is the hard part. A single server can handle tens of thousands of WebSocket connections, but when you need to scale horizontally, you need a way to broadcast messages to clients connected to any server. We used Redis Pub/Sub — every server subscribes to a channel, and when a message is published, all servers receive it and forward it to their connected clients.
Reconnection with backoff. Networks drop connections. Clients go to sleep. We implemented exponential backoff with jitter — starting at 1 second, doubling up to 30 seconds, with random jitter to prevent thundering herds. The client sends its last known state on reconnect, and the server sends a delta of what changed.
Heartbeats detect silent failures. Every 30 seconds, the server sends a ping. If the client doesn't respond within 10 seconds, the server closes the connection and the client reconnects. This catches connections that are half-open — the TCP connection is still alive but the client is gone.
Backpressure protects the server. If a client is slow to consume messages, the server buffers them up to a limit and then drops the connection. A slow client shouldn't slow down the server or other clients. We track message queue depth per connection and disconnect clients that fall behind.
The result was a system that handled 100k+ concurrent connections with 99.99th percentile latency under 500ms. The key insight: WebSockets are simple. Scaling them is a distributed systems problem.