The Blockchain Haters Guide To The AT Protocol

· 7 min read · atproto · at://

Like several of the tech twitterati, I've recently been going goblin mode over at Bluesky, a federated social network in private beta. As a long-time crypto and blockchain skeptic, I decided to take a look at the published documentation for the protocol that underpins Bluesky and write some thoughts.

Caveat, before I go into this too much - the public docs are pretty good, but there's a lot of TBDs and under-defined terms. That said, I applaud the team for what they've been able to put together here -- it's pretty cool.

If I get something wrong, let me know! Would love to correct this or do a followup -- again, this isn't my area of expertise.

At a high level, AT Protocol (ATP from here on out) defines three important components of a decentralized social network -- a way to manage identity (who you are), a way to store records (what you post, who you follow, etc.), and a way to communicate between clients and servers (how you read posts or make them).

Identity

There's two parts to 'who you are' that should be familiar to most developers -- there's a handle (such as @austinlparker.bsky.social or @shitposting.vip), based on a domain name, and a user identifier. The user identifier in ATP is a Decentralized Identifier (DID), which is essentially a cryptographically signed and verifiable GUID. To be somewhat reductive, you can think of it as a modern version of PGP that abstracts away a lot of the pain inherent in managing PKI (or, at least, it makes it someone else's problem).

At a pretty high level, this seems like a rather novel (if involved) system for managing identity on a federated network. It avoids one of the larger frustrations inherent to ActivityPub, which is that identity is scoped to an instance; If my Mastodon instance were to go away, so would my identity on the network. Admittedly, it does seem like there's a bit of handwaving in terms of actual federation here right now -- Bluesky is, as far as I can tell, the only actual host that supports their limited DID implementation. From my reading, I'd intuit that the full DID spec is extremely heavy, and they decided they only needed a handful of fields?

The exact mechanisms of how the DID server works aren't terribly interesting for the purposes of this post, but here's a few of the questions I have about how this is going to work at scale:

Storage

ATP defines a 'data repository' to be a collection of data published by a single user, expressed as a Merkle Search Tree (MST). Each node of the tree is a IPLD object which is referenced by a hash value.

In plainer terms, whenever you do anything on Bluesky, you're creating a new record. This record can be a follow, a block, a post, whatever. These records all conform to a universal data model which is designed to be linked to other records. Any individual record is immutable, which avoids some problems around consistency and state in a distributed system. A client can fetch this data repository and walk the linked list in order to perform actions (like showing you posts). Much of the more complex logic is implemented on the server rather than the client in order to speed up operations.

If this sounds kinda like the harried dreams of XML and semantic web proponents, well, you ain't far off. Instead of XML, though, it's JSON-like! Whee!!

Again, I don't want to dive too deep here because I'm not actually an expert on this, but here's the questions I have --

That said, you know what this seems like it'd be killer for? Calendaring...

Clients

How do we interact with this protocol? Well, you need a client. Clients and servers talk to each other in ATP using something they call XRPC, which looks an awful lot like gRPC. XRPC seems somewhat unique in that it explicitly calls for schemas to be published on the network, theoretically allowing for them to be easily iterated over and crawled by various automated processes.

The global schema for XRPC is Lexicon, which defines how requests and responses are communicated between clients and servers. It's a JSON document. This sort of self-documentation is rather refreshing -- not to mention pretty handy in the brave new world of LLM-assisted programming.

A couple of notes...

Conclusion

My extremely grudging praise is that it seems like the team over at Bluesky has managed to wrest a single generally-applicable use case from the morass of crypto bullshit that spawned it.

However, I'm not really sure that it's what their users currently want, and that's going to be a significant challenge for them going forward. I can see a model where bsky.social becomes the effective 'default' and sells access to their index, with only a small minority of overall users creating federated spaces and then carefully managing what can be exposed back to the main index (again, I don't actually see if this is possible currently, the docs aren't fully complete).

Either way, hats off to them -- and if you need me, I'll be posting with my new pals on Bluesky.