Terry Zhang
Terry Zhang
Get LlamaGen Free

What I Learned from Watching Google Docs Work — and Why It Changed Everything

I typed a single sentence into Google Docs.

"Embrace every kind of life" — a phrase from my work in writing, nothing special. Then I did what any curious engineer does at 2 AM with too much coffee: I opened the network tab and watched what happened.

Thirty-four discrete edit operations flew to the server before the sentence even finished rendering.

Thirty-four.

I sat back and stared at that number for a long time. That one moment — watching a handful of characters explode into dozens of atomic network requests — cracked open something I hadn't expected. What I thought was a product question turned into a philosophical one.


The Protocol Is the Product

Most engineers, when they think about collaborative editing, reach for the algorithm first. OT, CRDTs, conflict resolution — the academic machinery. I made the same mistake.

The real insight came earlier, at the protocol layer.

When I traced those requests, what I found wasn't a document being synchronized. It was a stream of commands — each one atomic, composable, and reversible. The operations mapped cleanly into four primitives:

  • create — bring a new entity into existence
  • update — mutate an existing property
  • delete — remove
  • tether — bind an entity to a position in the document

That last one surprised me most. Inserting an image isn't a single operation in Google Docs. It's a choreography: first a special placeholder character, then an addEntity call, then a tether to bind the entity to that position. Verbose, yes. But exquisitely trackable, replayable, and mergeable across any number of concurrent editors.

The protocol wasn't an implementation detail. It was the product.


A Document Is Not a String

Once I understood the protocol, I started working backwards to the data model — and the comfortable intuition that "a document is a string" collapsed almost immediately.

The top-level structure of a Google Doc contains: body, documentStyle, lists, namedStyles, revisionId, and more. The body is a sequence of StructuralElement objects. Paragraphs contain ParagraphElement runs. And the index system doesn't count characters — it counts UTF-16 code units, which means a single emoji (a surrogate pair) consumes two index positions.

These are the kinds of details that sound pedantic until the day you try to build comments, undo/redo, cross-device sync, or tables. Then they become life-or-death decisions. A string-based model will eventually crack. A structured object model keeps evolving.


The Three-Version Problem

After the protocol and the data model, I hit what I now think of as the true heart of collaborative engineering: version management.

The problem is simple to state. Every client is constantly producing diffs and pushing them to the server, which broadcasts them to everyone else. But networks are unreliable. So at any given moment, every screen in a collaboration session is showing a slightly different document.

To prevent this from spiraling into chaos, every command must be stamped with the revision it was based on. And to handle network failures, reconnections, out-of-order arrivals, and concurrent submissions, each client must simultaneously maintain three versions of the document:

  1. Server version — the last confirmed state from the server
  2. Sent version — changes dispatched but not yet acknowledged
  3. Editor version — local edits not yet sent

When a conflict arrives — when someone else's changes land before yours are acknowledged — the system performs something that feels like git rebase. It replays the incoming changes before your pending queue, applies inverse OT corrections to the unconfirmed operations, and reconciles everything without either party losing their work.

This is not academic rigor for its own sake. It's the engineering contract that makes collaboration feel effortless, even when the underlying reality is anything but.


Why This Led Me to Open Source

When I finished filling my Notion with all of this — every protocol detail, every data structure, every version-management edge case — I felt two things at once.

I was in awe of the engineering. It is genuinely beautiful. Precise, resilient, elegant under pressure.

And I was troubled.

Because this kind of capability is extraordinarily expensive to build and maintain. Which means it will naturally concentrate in the hands of a few large platforms. And when your documents live on someone else's server, you don't truly own them — you're renting access to your own thoughts.

That realization sharpened something I'd been circling for months into a clear commitment:

Collaboration should be more accessible, not less. And users should truly own their data.

When those two things become your north star, open source stops being a technical preference and starts being the only honest path. Open source means the capability can be copied and carried forward. Trust can be verified rather than promised. Self-hosting, migration, backup — these become facts, not features on a roadmap.

That winter, tracing a single network request through Google Docs, I didn't know where any of this was pointing. But every line I wrote in those notes was quietly sharpening one conviction:

Build it open. Give users their data back. Make the powerful tools available to everyone.

— Terry