How I Built a Real-Time Chat System with React Native & WebSockets

Why WebSockets Over REST Polling

When a client approached me to build a chat feature for their healthcare coordination app, the first architectural decision was the transport layer. REST polling would have introduced unacceptable latency for a medical context where messages need to arrive within milliseconds. WebSockets gave us a persistent, bidirectional connection that kept the UI feeling instant.

We benchmarked three approaches early on. Short polling at a one-second interval consumed roughly 86,400 requests per device per day and still felt sluggish. Long polling reduced the request count but introduced unpredictable latency spikes between 200 and 1,500 milliseconds depending on network conditions. WebSockets, by contrast, maintained a single TCP connection per device with message delivery consistently under 50 milliseconds on a stable connection.

The stack I settled on was React Native on the front end, a Node.js server using the ws library, and Redis as a pub/sub broker for horizontal scaling. Each device opens a single WebSocket connection on app launch, and the server fans out messages through Redis channels scoped to conversation IDs.

Setting Up the WebSocket Server

The server setup is straightforward, but production readiness requires more than just accepting connections. You need heartbeat monitoring, authentication on the upgrade request, and structured message routing. Here is the core server implementation I used:

import { WebSocketServer, WebSocket } from "ws";
import { createClient } from "redis";
import { verifyToken } from "./auth";

const wss = new WebSocketServer({ noServer: true });
const redisSubscriber = createClient();
const redisPublisher = createClient();

// Authenticate during the HTTP upgrade
server.on("upgrade", async (request, socket, head) => {
  try {
    const token = new URL(request.url!, `http://${request.headers.host}`)
      .searchParams.get("token");
    const user = await verifyToken(token);
    wss.handleUpgrade(request, socket, head, (ws) => {
      (ws as any).userId = user.id;
      wss.emit("connection", ws, request);
    });
  } catch {
    socket.write("HTTP/1.1 401 Unauthorized\r\n\r\n");
    socket.destroy();
  }
});

// Heartbeat to detect stale connections
const HEARTBEAT_INTERVAL = 30_000;
wss.on("connection", (ws: WebSocket) => {
  (ws as any).isAlive = true;
  ws.on("pong", () => { (ws as any).isAlive = true; });
});

setInterval(() => {
  wss.clients.forEach((ws) => {
    if (!(ws as any).isAlive) return ws.terminate();
    (ws as any).isAlive = false;
    ws.ping();
  });
}, HEARTBEAT_INTERVAL);

The heartbeat mechanism is critical. Without it, you accumulate zombie connections from users who lost network without a clean TCP close. In production, we saw roughly 8 percent of connections go stale without a proper close frame, which would have leaked memory on the server over time.

Handling Offline and Reconnection

Mobile networks are unreliable by nature, so I built an offline queue into the React Native client. Outgoing messages are persisted to AsyncStorage before they hit the socket. A background reconciliation loop replays any unsent messages once the connection is re-established, deduplicating by a client-generated UUID attached to every payload.

const sendMessage = async (text: string, conversationId: string) => {
  const message = {
    id: uuid(),
    conversationId,
    text,
    timestamp: Date.now(),
    status: "pending",
  };
  await AsyncStorage.setItem(`msg:${message.id}`, JSON.stringify(message));
  socketRef.current?.send(JSON.stringify(message));
};

The reconnection logic on the client deserves special attention. A naive approach that reconnects immediately on close will hammer your server during an outage. I implemented exponential backoff with jitter, which distributes reconnection attempts across a time window and prevents a thundering herd when the server comes back online:

class ReconnectingWebSocket {
  private url: string;
  private ws: WebSocket | null = null;
  private retryCount = 0;
  private maxRetries = 10;
  private messageQueue: string[] = [];

  connect() {
    this.ws = new WebSocket(this.url);

    this.ws.onopen = () => {
      this.retryCount = 0;
      // Flush any queued messages
      while (this.messageQueue.length > 0) {
        const msg = this.messageQueue.shift()!;
        this.ws?.send(msg);
      }
    };

    this.ws.onclose = (event) => {
      if (event.code !== 1000 && this.retryCount < this.maxRetries) {
        const baseDelay = Math.min(1000 * Math.pow(2, this.retryCount), 30000);
        const jitter = Math.random() * baseDelay * 0.3;
        setTimeout(() => this.connect(), baseDelay + jitter);
        this.retryCount++;
      }
    };
  }

  send(data: string) {
    if (this.ws?.readyState === WebSocket.OPEN) {
      this.ws.send(data);
    } else {
      this.messageQueue.push(data);
    }
  }
}

The jitter factor of 0.3 was tuned through production observation. During a planned maintenance window where we restarted the WebSocket servers, the first attempt without jitter produced a reconnection spike of 4,200 simultaneous connections in under two seconds, which overwhelmed the server. After adding jitter, the same reconnection wave spread across roughly twelve seconds.

Message Handler Patterns

On the server, incoming WebSocket messages need structured routing. Rather than a growing switch statement, I used a handler map pattern that keeps message processing modular and testable:

type MessageHandler = (ws: WebSocket, payload: any) => Promise<void>;

const handlers: Record<string, MessageHandler> = {
  "chat:send": async (ws, payload) => {
    const { conversationId, text, id } = payload;
    const message = await db.messages.create({
      data: { id, conversationId, text, senderId: (ws as any).userId },
    });
    await redisPublisher.publish(
      `conversation:${conversationId}`,
      JSON.stringify(message)
    );
    ws.send(JSON.stringify({ type: "chat:ack", messageId: id }));
  },
  "chat:typing": async (ws, payload) => {
    await redisPublisher.publish(
      `conversation:${payload.conversationId}`,
      JSON.stringify({ type: "typing", userId: (ws as any).userId })
    );
  },
  "chat:read": async (ws, payload) => {
    await db.readReceipts.upsert({
      where: { conversationId_userId: {
        conversationId: payload.conversationId,
        userId: (ws as any).userId,
      }},
      update: { lastReadAt: new Date() },
      create: {
        conversationId: payload.conversationId,
        userId: (ws as any).userId,
        lastReadAt: new Date(),
      },
    });
  },
};

wss.on("connection", (ws) => {
  ws.on("message", async (raw) => {
    const { type, ...payload } = JSON.parse(raw.toString());
    const handler = handlers[type];
    if (handler) await handler(ws, payload);
  });
});

This pattern also makes it easy to add middleware for rate limiting or logging. Each handler is a standalone async function that can be unit tested without spinning up a real WebSocket server.

Scaling Beyond a Single Server

The trickiest part was scaling horizontally. A naive WebSocket server keeps all connections in memory, which means a single process can only handle the users connected to it. By publishing every incoming message to a Redis channel keyed by conversation ID, any server instance subscribed to that channel can push the message to the relevant connected clients.

We used Redis pub/sub rather than Redis Streams because the use case was fire-and-forget fan-out. Messages were already persisted to PostgreSQL before being published, so we did not need Redis durability. Each server instance subscribes to the conversation channels for its currently connected users, and unsubscribes when a user disconnects.

In load testing, this architecture handled over 12,000 concurrent connections across three server instances with p95 delivery latency under 85 milliseconds. The key insight was batching Redis pub/sub acknowledgements rather than awaiting each one individually. At 15,000 connections we started seeing Redis CPU saturation, which we resolved by sharding conversations across two Redis instances using consistent hashing.

Common Pitfalls

Not handling backpressure. If the server pushes messages faster than the client can process them, the WebSocket buffer grows unbounded. We added a check on ws.bufferedAmount before sending and dropped non-critical messages like typing indicators when the buffer exceeded 64 KB.

Ignoring message ordering. WebSocket guarantees ordering per connection, but when messages pass through Redis pub/sub across multiple server instances, delivery order can shift by a few milliseconds. We used the client-generated timestamp to sort messages on the frontend rather than relying on arrival order.

Skipping connection draining during deploys. Rolling deploys that kill server processes immediately disconnect all WebSocket clients. We implemented a graceful shutdown that stops accepting new connections, sends a close frame with a custom code to connected clients telling them to reconnect to a different instance, and waits up to ten seconds for existing message processing to finish.

Neglecting authentication token expiry. WebSocket connections are long-lived, but JWT tokens are not. We added a periodic re-authentication mechanism where the server sends a challenge message every thirty minutes, and the client responds with a fresh token. Failure to respond within five seconds triggers a disconnect.

Results in Production

The final system has been running in production for over eight months serving approximately 3,400 daily active users across 180 healthcare provider organizations. Average message delivery latency sits at 47 milliseconds, and the offline queue has successfully replayed over 23,000 messages that would have been lost to network interruptions. Server memory usage stabilized at around 320 MB per instance for 4,000 concurrent connections, well within the 1 GB limit of our container allocation.

Building a real-time chat system taught me that the transport layer is only about twenty percent of the challenge. The other eighty percent is handling the messy reality of mobile networks, horizontal scaling, and the operational concerns that surface only after your first thousand users connect simultaneously.