Putting A Trillion-Dollar Industry On-Chain
Recently, I was surprised to learn that a lot of people don't believe it's possible to run a Twitter-scale social network on a blockchain.
Notable critics of this model include none other than Elon Musk himself, whose leaked texts recently revealed him as concluding: "blockchain Twitter isn't possible." As a result, the dominant conversation I see in my crypto circles is "decentralizing social media is important, but a blockchain with social content on it will never scale." This has been causing people to argue for schemes like partial decentralization, namely putting bits and pieces of content on a blockchain, but with the vast majority residing on a centralized server. For example, maybe your profile goes on-chain, but all your posts, likes, follows, etc... are stored on a centralized server.
Well, as someone who's spent years focused directly on the problem of decentralizing social media, I guess I have a controversial take: I believe it's not only possible to run a Twitter-scale social network on a decentralized blockchain, but that running such a network on a blockchain is actually highly-beneficial for many reasons.
As such, in this post I'm going to not only prove that it's technically possible to build a blockchain that can power a Twitter-scale social network, but I'm also going to sketch a concrete roadmap, and argue for why it's so important to have as much content on-chain as possible.
By the way, fun fact: The post you're reading right now is published on DeSo, a blockchain built to power a Twitter-scale social network :)
In this section, I want to talk about what it would cost, in aggregate, to run a blockchain that can power Twitter. We'll start with how much it costs to store one full copy of Twitter's data and work our way up.
Importantly, one will notice that I'm not going to be worrying too much about the computational overhead or network overhead of running a node-- just storage. And that's because these other components become dominated by the storage equation in the long-term. In fact, even today, DeSo nodes can process about 50,000 posts per second unparallelized, which is already about an order of magnitude more than what's required to run Twitter at scale. We'll discuss this in the next section, but you can also confirm this yourself by running this test case here.
First, how much data really goes through Twitter on an annual basis? Well, we can actually estimate the answer to that!
That seems like a lot, but how much would it cost to actually store that information?
Well, if you check out AWS's EBS disk pricing as a proxy, you will see that it costs about $0.08 per GB per year, or about $960 per TB per year. AWS is not the cheapest, not by a long shot, but let's roll with it for now.
If it cost $960 per TB, then the cost of storing one year's worth of Twitter's data is about $118,080 per year (123 TB times $960 per TB).
But of course, we want to store more than a year's worth of data. So let's throw a multiplier of 15 onto that (the number of years Twitter has been around), and we wind up with a cost of $1,889,280 per year to store one full copy of Twitter (~$1.8M per year).
In blockchain parlance, an entity that stores the entire copy of all of the data that's ever gone through a blockchain is called an archival node. So our calculation above technically computed the cost of running a single archival node on a Twitter-scale blockchain.
The question then becomes: How do you incentivize people to run archival nodes, and how many archival nodes do you want to incentivize?
The interesting thing about blockchain is that we can actually precisely incentivize the level of redundancy, aka "decentralization," that we want in our network, and pay no more than exactly what's needed to assure it. This is typically done via a block reward that is essentially a regular payment that's made to people on the network who are doing useful work.
In Bitcoin, people are rewarded for doing useless computation, but in our case we want to reward node operators for storing a copy of all the social media data. This can be done using storage proofs, pioneered by Filecoin and Arweave, which basically allow someone to prove to you that they're storing a full copy of whatever dataset you want them to be backing up. So, a block reward based on storage proofs would look as follows:
Now, with all that said, suppose we want 100 archival nodes on our network (we'll talk about how to increase this in the next section). Then we can set a dynamic block reward that pays people for running nodes in the following way:
Dynamic block rewards are a novel concept I'm introducing here. But they're a perfect solve for what we're trying to do: Assure that we have a sufficient amount of replication in our social network at all times. Notice that I didn't even need to actually specify the value of the block reward-- instead, I just specified a feedback loop based on how many nodes are storing full copies of all the data.
This mechanism has a "magical" property, which is that it should drive the cost of running the network down over time. For example, if someone finds a cheaper way to store all the content than paying $2M per year to AWS, then they keep the margin. And the market will compete until we have the lowest-cost network powering all of our social apps!
With the above being said, what is the total cost of running the network if we have 100 archival nodes? Well, with this scheme, we can expect it to converge to no more than $190M per year to cover 100x replication, or ~1.9M * 100 (i.e. 100x the cost of running a single node from our previous calculation). This is quite reasonable when one considers that Twitter's revenue was over $1 billion last year, and that we should be able to earn significantly more with a blockchain business model, as we will discuss later on.
Note that care must be taken in designing the mechanism so as to prevent things like Sybil attacks, but I'm glossing over details like this in order to make things easy to read. If you're curious, this video from Filecoin does a great job at discussing the concerns and how to solve them.
Importantly, as much as we can go on about "storage proofs" and "block rewards," it is important to mention that it's not unreasonable to expect that a significant number of entities will be willing to store full copies of this data for free. How could this be?
Well, remember that there's inherent value in storing all of the data associated with a social network. Namely, that you can build really cool social apps off of it that you can then monetize with ads or, more likely because we're on a blockchain, with fees. Because apps built on a blockchain will naturally have access to users' wallets, they will be able to monetize in all kinds of ways we'll discuss that involve micro-transactions, NFTs, Social Tokens, paid DMs, and more. And it's not crazy to think that there may be a significant number of apps that will be profitable to the point where it makes business sense for them to spend ~$2M on infrastructure costs just to have a full archival node.
Moreover, we have to anticipate that "nodes as a service" will emerge that will run full archival nodes and then charge other app developers for using their APIs. These node service providers will then compete and drive down the costs while maintaining an open ecosystem. Such an ecosystem has already evolved around Ethereum with Alchemy, Infura, QuickNode, and others, and it would stand to reason that such services would extend to the social case. If anything, the fact that social data is more interesting than pure financial data creates even more incentive for such an ecosystem to form.
But let's say that we want to make our network even more decentralized and sustainable. Maybe we want people to run 10,000 archival nodes rather than just 100. Well, there's actually a pretty easy way to do that: Increase dem fees!
When you build a social network on a blockchain, you have the unique ability to charge micro-fees on every action a user takes: Every post, every like, every follow, every transfer of funds, every NFT interaction... you get the idea.
And what do you do with those fees? Why, you burn 'em of course! Why do you do that? Well, when fees are burned, it's kindof like sharing the earnings of the blockchain with everyone that's holding the blockchain's coins (but not actually because coins aren't securities!). For example, if we are paying $200 million to node operators and we burn $1 billion in fees, then that's a net $800 million in deflation that is going to increase the scarcity of the blockchain's coin. Alternatively, if burning is a hard concept to wrap your head around, you can imagine the fees are alternatively put into a "pool" from which node operators are paid, and from which coins are distributed back to existing holders like a dividend (but, again, not actually a dividend because coins aren't securities!).
Interestingly, increasing fees can have multiple positive effects on the network:
So, suppose you wanted to charge a mere penny per engagement. We computed previously that we get ~567 billion engagements per year (~189 billion * 3). For now, suppose the number of engagements didn't go down from doing this. Then that would net us ~$5.67 billion in fees per year, which is enough to cover ~2,835 nodes easily! Now, of course the number of engagements would go down if you were charging a penny each-- but that would mean that the cost of running a node would go down too! And so somewhere on the curve there's likely a "goldilocks fee rate" that cuts down on spam and increases decentralization without meaningfully harming high-quality content.
This is a bit of a philosophical question. Bitcoiners would say hundreds of thousands are needed while Solana people would say you only need a few hundred. The issue is further complicated by the fact that modern blockchain networks have different types of nodes. In particular, you can be a validator of transactions without being a full archival node. And indeed, Ethereum is estimated to have relatively few archival nodes relative to the number of validators.
This article is getting long and I don't want to lose people on the difference between validators and full archival nodes. But suffice it to say that I think probably a few dozen full archival nodes are actually enough. And I'll try to explain why I think that in the next section on why storing content on-chain is important.
Up to this point, we've mainly been concerned with proving that it is technically feasible to build a blockchain that can run a Twitter-scale social network. Now it's time to talk about why it's a good approach to do so.
First, let's talk about the money stuff. If you want to have a lot of copies of a social network's data, you have to ask a simple question: How are you going to create the incentives to store all that data? There may be some players who are willing to shell out the ~$2M per year with no remuneration, but blockchains are naturally-suited toward solving incentive problems like this, e.g. with the Dynamic Block Reward suggested previously.
But, putting aside incentive mechanisms like the Dynamic Block Reward, which can guarantee replication and decentralization of content, blockchains inherently enable devs to build much more interesting things with money than traditional systems allowed previously. This is because blockchains allow for the programming of money in a way that isn't possible with fiat, even with very advanced tools like Stripe.
Some concrete examples that you can try for yourself on DeSo apps like Diamond today are micro-tips, social NFTs, social tokens, and DAO-related activities. Something as simple as putting a tip button next to the like button is something that has apparently been impossible for centralized platforms to figure out, and yet it was the first thing developers on DeSo tried (with unbelievable success). This simple change results in users' first experiences on DeSo apps consisting of being paid for their thoughts almost immediately. As another example, because users' account balances are readily-accessible, most apps have resorted to sorting comments by users' wallet balance, which results in bots automatically sorting to the bottom of threads. And we haven't even scratched the surface of what's possible with the tight coupling of money and social media. I personally expect to see a lot of innovation around social NFTs, social tokens, and DAOs as people onboard onto DeSo and developers get more familiar with all of its monetization tools.
Next, we have to talk about censorship. If you put a centralized entity in charge of the content, the way social media is set up today, we have to assume that they will censor content based on more than just what's legally required of them. If they didn't they would literally be the first dictator to ever not abuse their power in some way... But how, specifically, does a blockchain break this paradigm?
Well, when you store content on a blockchain, that makes it so that the firehose of content that is being generated by users is accessible to any app developer in the world that wants to build an app or curate a social feed. This means, for the first time, developers can build social apps without having to solve the cold-start problem. The craziest thing about this that's hard for people to wrap their head around at first is that a post in one app can automatically show up in all the others. And if a new app spins up, all your followers from previous apps can be carried over to the new app (or not-- apps have the option but not the obligation to use existing content). This is why we refer to creating a profile on DeSo as making "the last social media account you'll ever need." You never have to rebuild your following from scratch again!
This is amazing because it means that, for the first time, you can have competition and innovation when it comes to social media and the curation of social content. Instead of a single highly-centralized entity building a one-size-fits-all content policy, developers from all over the world can build apps tailored to particular audiences. Developers in a foreign country like India, for example, can build a feed that's tailored to their particular region, rather than being locked-in to an app exported by Silicon Valley and thereby importing their biases. Indeed, there will likely be many feeds in each country that all compete for different segments of the population. There can also be verticalized feeds based on topics: A sports-focused feed, a crypto-focused feed, etc... Not to mention wholly-novel content experiences that we can't imagine yet.
Importantly, though, if any of the content is not stored on-chain, you immediately break the inter-operability of apps (which I refer to as content composability). For example, suppose you put profiles on-chain but you don't put the posts on-chain like some people in the crypto space are pushing for. Well, now every app has to bootstrap its content from scratch again! Ok, but what if you put the posts on-chain but not the follow graph? Same problem! What about the likes? Now you lose the ability for a new app to get engagement from the existing network (remember a post on one app shows up on all the others). Anytime a decision is made to move content off-chain, you are losing the ability to build a developer network effect around that component of the network.
Going back to censorship, the app ecosystem that can form around an open pool of content on a blockchain immediately makes it so that no individual can be blanket-banned. Instead, the decision to silence someone becomes federated across an ecosystem of thousands of apps that are each responsible for making their own independent decisions about what to censor and why. And remember that the apps are probably going to be heavily-differentiated from each other by virtue of the fact that they're each competing for their own particular niche of users.
In a recent interview with Lex Fridman, Mark Zuckerberg goes in-depth on his decisions around censorship. He talks about decisions he personally made to nerf certain content, and poses a question to the viewer: "What would you do if you were me?" Indeed, what would one do if they were the God of Information? Perhaps we should reject the premise of the question entirely. Putting content on-chain does exactly this: There is no God of Information when content is on-chain. Instead, competition and choice naturally create diversity of thought.
I've recently started writing again after not writing for a very long time. As such, I'd really appreciate any feedback you have for me in the comments section. Everything from my style of writing to my actual arguments are fair game, and nothing is too critical. Thank you!