Ever since I read the book Ready Player One I’ve been fascinated by the idea of the metaverse. Imagine instead of joining an IRC chatroom, you joined a virtual room with objects: chairs, tables, robots, other people? A virtual shared space where you could meet up with your friends and hang out. This is typically envisioned as some sort of virtual reality where you meet with people face to face, but the metaverse could be much more general than that.
I started thinking about, what could a metaverse protocol look like? Is such an ambitious project even possible? How do we avoid the mess that is the web? Could we keep it simple and extensible? I believe I have a pretty good plan on how to achieve this: I’m calling it the protoverse.
Before we start thinking about the ingredients needed to build a metaverse protocol, first let’s look at some high level goals:
Accessible. The metaverse shouldn’t exclude those who are blind. This was one of the goals of the original web and we should strive to have an accessibility plan.
Simple. We should learn from the web and not force client vendors to implement a large number of ad-hoc specifications.
Flexible. It should support many different use cases and environments. You should be able to connect to a server and be served a multiplayer game, a room, or whatever experience the server operator wishes.
Interconnected. Like the web with hyperlinks, there should be some way to jump from server to server in a standard way.
The protoverse is the metaverse protocol I’m working on that is trying to achieve all these goals. At a high level, protoverse is a network of abstract virtual spaces. It achieves this with a few key ideas:
You define a “space” with a high-level description like so:
(room (shape rectangle) (material "gold") (name "Satoshi's Den") (width 10) (depth 10) (height 100) (group (table (id welcome-desk) (name "welcome desk") (material "marble") (width 1) (depth 2) (height 1) (light (name "desk"))) (chair (id welcome-desk-chair) (name "fancy")) (light (location ceiling) (name "ceiling") (state off) (shape circle))))
When you first connect to a server, you pull this high-level description to quickly get an idea of where you are and what types of entities are in the environment. The server could dynamically generate this description or it could be static. This “space” is analogous to html documents. So far so good.
At this point, if there is a more detailed description of the room, the client could start pulling additional texture and model information via protocol messages.
Due to this level-of-detail feature, simple text clients can still do something useful here. For instance, if you just want to get an idea of what the room is about without rendering anything, you can use a text-description client:
$ ./protoverse serve index.space serving protoverse on port 1988... $ ./protoverse client proto://localhost
There is a clean and shiny rectangular room made of solid gold called Satoshi’s Den. It contains four objects: a welcome desk table, fancy chair, throne chair and ceiling light.
As you can see, in this case the client simply parses the high-level description and outputs a description of the room. More advanced clients could render a 2D representation of the room, and even more advanced clients could render full VR-capable experiences.
With WASM you can use any programming language to code the metaverse. Protoverse comes with an embedded WASM interpreter that can execute WASM code. You will be able to augment clients to render your space in greater detail, show HUD elements, create multiplayer games, etc.
You can already do a lot without client computation. For instance, your space could be served dynamically, which you could periodically fetch to get an updated description of the room. This would be equivalent to “refresh the page” on the web, except due to the level-of-detail nature of the protoverse, you wouldn’t need to refetch the entire room. The client could cache models and other details that have been previously fetched.
The default, high-level description of the room could include position information, so you will be able to see things that have moved when you “refetch” the state of the room. State updates like this could be a bit jarring, so most likely you wouldn’t want to reload the room for position updates, these can be served via “object state/position update” network messages.
What you do with these network messages could be handled automatically for simple cases by the client, but otherwise could be handled by WASM code served by the protoverse server.
Thanks to WASM, we can offload much of the rendering to WASM code that chooses how to render its environment. This does affect accessibility, so we need to be careful, but it does have the benefit of avoiding a huge pain point of the web: the massive growth of specifications required to implement web functionality. If we have a very thin client with a small set of rendering APIs (Vulkan? Curses?), then protoverse servers can provide any experience it desires. It could serve full multiplayer video games!
I still have more to think about with respect to server-to-server communication, but there is some interesting potential here. For now, the protocol only cares about client to server communication, such as updating entity positions, etc. I think it makes sense for there to be a variety of server-to-server protocols for something like the metaverse, I just haven’t thought too deeply as to what those could be yet.
The design space for metaverse protocols is huge. I would love to brainstorm new ideas about how I could improve the protoverse. If you have any ideas feel free to send your thoughts to the protoverse mailing list at:
Also, patches welcome! I’m currently working on the protoverse WASM interpreter. If you want to help hack on the project feel free to email patches to
That’s all for now. I plan on posting more protoverse updates here on my gemlog in the future!