Real-Time Collaboration Suite
Presence, chat, and video that don't fall over under load.
<1s
Reconnect (p50)
6x
Capacity factor
5%+
Packet loss tolerated
Overview
A real-time stack for a remote-first team product: presence, chat, document cursors, and selective-forwarding video — built to recover gracefully when networks misbehave.
The problem
The team's first version used a single Socket.IO server and a permissive reconnection model. As soon as one rooms list crossed a thousand users, reconnections caused thundering-herd cascades.
Approach
- 01
Moved presence and pub/sub onto Redis with consistent hashing across WebSocket workers.
- 02
Replaced naïve reconnect with exponential backoff, server-side jitter, and a slow-start handshake.
- 03
Picked Mediasoup over a full mesh — SFU gives quality control room and survives bandwidth dips.
Outcome
Survived a 6x usage spike during a product launch with no operator intervention.
Median reconnect time after network drop fell from 8.4s to under 900ms.
Video bitrates adapt cleanly to packet loss above 5% without dropping participants.