Hello, readers. If you read my last post, you know I recently built a terminal-based chess engine in Rust called tty-mate. It had a fully working FIDE-compliant physics engine and a clean UI, but it was just a local application. You had to share a keyboard to play against someone.
The long-term goal was always to make it multiplayer. I wanted to be able to spin up a server and have two terminals connect over the internet to play a seamless game.
Transitioning from a single-player script to a concurrent network server is a massive leap in complexity. It meant dealing with async runtimes, shared state, and network protocols. Here is how I built the multiplayer architecture using tokio.
Splitting the Monolith: Cargo Workspaces
Before writing any network code, I had to clean up my project. The original app was a monolith, which wouldn't work if I wanted a separate server and client binary sharing the same chess rules.
I restructured the repository using Cargo Workspaces. I broke it down into:
core: The physics engine (theBoard, piece logic, and validation).client: The Ratatui terminal UI and local input handling.server: The new Tokio TCP broker.
This separation of concerns meant the server didn't need to know how to draw a terminal, and the client didn't need to know how to handle 10,000 TCP streams.
Defining a Custom Protocol
To make the client and server talk, I needed a protocol. Instead of bringing in heavy dependencies like JSON or Protobuf, I wanted to keep it minimal and fast for the MVP. I created a fourth workspace, api, to act as a shared library for both the client and server.
I designed a custom, string-based TCP messaging protocol. Every message is separated by a colon : and ends with a newline \n.
Server Messages:
- Move (From, To, Piece):
m:f:t:p(Here, the Piece is for Pawn Promotion logic) - Game Start (GameId, Color):
s:i:c - Game Aborted:
a
Client Messages:
- Move (From, To, Piece):
m:f:t:p - Quit:
q
Game Errors:
- Invalid Message:
e:i - Invalid Move:
e:m - No Game Found:
e:n
The tty-mate-api library automatically parses these raw strings into safe Rust enums (like ClientMessage::Move) and converts them back to strings for network transmission.
The Server Architecture
With the protocol set, I started writing the server. When dealing with multiplayer games, the server must be the ultimate source of truth. If the client validates its own moves, players can cheat by sending hacked packets.
Here is how the server manages concurrency:
- Shared State: The server holds a global matchmaking queue and a list of active games, safely wrapped in an
Arc<Mutex<Server>>. - Task Per Client: Every time a client connects, the server spawns an independent Tokio background task just for them.
- Shared Game State: When two opponents are matched, they are assigned an
Arc<Mutex<Game>>.
The Lifecycle of a Move
To understand how it all pieces together, here is the exact flow of what happens when you play online:
- Client Connects: A user boots the client. They are added to the server's Matchmaking Queue.
- Opponent Connects: Another user connects. The server pops the first user from the queue and creates a new
Game. - Clients Notified: The server sends an
s:i:cpacket to both clients, assigning them their colors (White/Black) and starting the match. - The Move: The White client physically makes a move on their terminal.
- Client State Update: The client immediately updates its own local UI so the game feels responsive.
- Server Notified: The client sends the
m:f:t:ppacket to the server. - Server Validation: The server locks the shared
Gamestate, checks whose turn it is, and runs the move through thecoreFIDE state machine. If the move is illegal, it rejects it and sends ane:merror. - Server Update: If valid, the server officially mutates its internal board state.
- Opponent Notified: The server relays the
m:f:t:pstring to the Black client. - Opponent Update: The Black client receives the string and updates its local board.
Screenshots
// Fig. Local Game
// Fig. Multiplayer Game
Benchmarking and the Bottleneck
Once the server was built, I wanted to see how much traffic my local machine could handle before the architecture broke. I wrote a script to simulate a massive TCP connection storm.
I slammed the Tokio server with concurrent clients, spamming chess moves as fast as physically possible.
The results were insane. The zero-cost abstractions of Rust and the Tokio runtime allowed the server to chew through hundreds of thousands of messages per second. The physics engine was processing complete FIDE rule validations in fractions of a millisecond.
Note: For all tests, each connected client immediately fires off a burst of 20 consecutive moves, with the matchmaking queue timeout strictly set to 10 seconds.
First test, 1000 clients connecting at the same time,
===== BENCHMARK COMPLETE =====
Total Time: 1.08s
Time (Ignoring Orphans): 1.08s
Clients Connected: 1000/1000
Games Started: 500/500
Total Server Replies: 20000
Connection Errors: 0
Throughput: 18438.66 messages/sec
The server chewed through 1000 concurrent clients completely fine while handling around 18438 messages per second.
Second test, 2000 clients connecting at the same time,
===== BENCHMARK COMPLETE =====
Total Time: 11.10s
Time (Ignoring Orphans): 1.10s
Clients Connected: 2000/2000
Games Started: 838/1000
Total Server Replies: 33520
Connection Errors: 0
Throughput: 30507.67 messages/sec
Here, we see a problem, all clients connected successfully without any errors, but only 838 games were created, meaning 324 clients timed out. Let's continue for now.
Second test, 10000 clients connecting at the same time,
===== BENCHMARK COMPLETE =====
Total Time: 12.15s
Time (Ignoring Orphans): 2.15s
Clients Connected: 10000/10000
Games Started: 3989/5000
Total Server Replies: 159560
Connection Errors: 0
Throughput: 74203.78 messages/sec
The server easily chewed through 74,000+ messages per second, but 2,022 clients timed out in the queue. The good news? The server didn't crash, it degraded gracefully and kept the active games alive. The bad news? I had a massive structural bottleneck.
Because the entire Matchmaker queue was hidden behind a single, global Mutex, 10,000 asynchronous tasks were all fighting for the exact same lock at the exact same time, a classic systems engineering problem known as the "Thundering Herd". This wasn't the case with the game state, since it was physically impossible for more than 2 clients to fight for it, resulting in very high throughput, as we can see in this example,
1000 concurrent clients, but a burst of 100 consecutive moves per client.,
===== BENCHMARK COMPLETE =====
Total Time: 1.15s
Time (Ignoring Orphans): 1.15s
Clients Connected: 1000/1000
Games Started: 500/500
Total Server Replies: 100000
Connection Errors: 0
Throughput: 87137.97 messages/sec
The current architecture survived, but to make this truly production-ready for massive scale, the global lock has to go.
But replacing a global Mutex with an asynchronous Actor model? That's a story for the next post.
