Over Christmas break, 2013 I wrote a small computer program to parse the entire bitcoin blockchain and output some statistics. At the time my major motivation was simply a desire to understand how many bitcoins in circulation are potentially "zombies"; meaning those associated with addresses that have remained untouched for several years and might well be permanently lost.
Since then, I have gathered additional data.
For example, when the infamous transaction malleability problem was first reported (and certain people were using it as an excuse for why bitcoin was destined for failure), I started analyzing the blockchain for occurrences of this signature pattern over time. What I found was that it really was non-existent prior to the initial reports last year, that it lasted for a very short period of time, and affected relatively few transactions.
This showed what many of us already suspected, that the transaction malleability problem was being over-hyped and used as an excuse for why certain people (rhymes with Karpa-stole-my-coins-ellis) may have lost a lot of money for entirely different reasons.
Another thing I became interested in was to see how blockchain statistics evolve over time, so I started collecting data on a per-day basis which would let me graph the results in a visually compelling way.
Throughout the past year the blockchain has grown substantially in size.
It eventually became so large my statistics program could no longer process it. I have made some small fixes throughout the year but fairly soon the entire thing may need to be rewritten to process the blockchain correctly.
Another problem I ran into was my discovery that there were a couple of signature types that my parser was not handling correctly. These turned out to be multi-signature addresses, stealth addresses, and pay to script hash addresses. All three of these signature patterns hardly show up in the blockchain at all, but recently there has been a big change which I will be discussing further in this article.
My program which analyzes the blockchain accomplishes this by collating all of the blocks, transactions, inputs, outputs, and public addresses directly into computer memory and making each accessible via a high speed hash-map. This allows me to sort every single one of the nearly 60 million bitcoin public addresses by any key in an instant. While it would be easy to do this with a conventional database it would also be a lot slower as well.
The software which produced all of the data for this article is open source and may be found here on the Google Code repository. Blockchain Parser link.
Now that we are at the end of the year, and I had a little bit of free time over Christmas break, I thought I would provide some new charts and summary reports about the state of the blockchain today.
The first time I ran my blockchain statistics program was right around a year ago. Here are some changes.
- We went from 30.2 million total transactions to 55.5 million.
- We went from 24.7 million public keys in use to 58.9 million.
- We went from 1.15 million dust addresses to 1.9 million.
Throwaway Intermediate Zero Balance Public Keys
The bitcoin community has often given the advice that people should always generate a new address for every transaction. Well, it seems that people, and more accurately the creators of wallet software, have been listening.
We are generating about 70,000 new zero-balance public keys in the blockchain per day, which is roughly about the same as how many transactions there are. These zero-balance keys are addresses which had some value in them during the day but, by the end of it, all of that value was completely spent.
So much of the wallet software we use today generates intermediate keys that this is now the norm and to be expected.
I have created several graphs to help visualize how keys are generated and used on the network on a daily basis. The breakdown for each chart is as follows:
- OldKeyValueCount : This is the number of bitcoin addresses which had value before the day began and had send transactions performed on it during that day. At the end of the day, this key still had value in it.
- OldKeyZeroCount : This is the number of bitcoin addresses which had value at the beginning of the day but were completely emptied to a zero balance by the end of the day.
- OldKeyDustCount : This is the number of keys which had a spend transaction but was left with a dust balance at the end of the day. Dust is defined as having a balance of less than one millibit (about 30 cents)
- NewKeyValueCount : This is the number of brand new bitcoin addresses which were generated on this day and, at the end of the day, had more than dust amount of value in them.
- NewKeyZeroCount : This is the number of brand new bitcoin addresses which were generated on this day and, at the end of the day, were completely zeroed out. Most likely all of these keys were simply used as an intermediate transfer of value, possibly by mixers or other automated wallet software.
- NewKeyDustCount : This is the number of brand new keys generated which, at the end of the day, contained 'dust' amounts of bitcoin. These are fairly undesirable as all of this dust just clutters the blockchain and can, in some contexts, be thought of as a DDOS attack.
The Daily Bitcoin Address Distribution by type for the lifetime of the blockchain as absolute value counts of total bitcoin addresses consumed:
Here is the same 2014 data but displayed as percentages. As you can see a little over 50% of all new bitcoin addresses seen on the network daily are thrown away. You can also see that somewhere between 5-10% of all new public bitcoin addresses on the network are effectively 'spamming' the system with dust balances.
Finally, here is the same normalized percentage graph but this time showing how public key distribution has changed on the network over the entire life of the blockchain. You should take particular note of the spam attacks on the network which occurred in July of 2011 and the further smaller attacks over time.
Bitcoin Dust and DDOS Attacks
There has been a lot of chaos created by DDOS (distributed denial of service) attacks on the bitcoin network over the years. In an ideal world the bitcoin network would be difficult to attack because there would be so many machines connected running a full node. However, the ongoing advancement of mining pools and limited number of people actually running a full node, bitcoin has become more and more centralized all of the time.
Ideally more people would be running a p2pool where all of the machines mining are also running a full node. link
One way to attack the bitcoin network is to submit a massive number of dust transactions. Since bitcoin is divisible by 8 decimal places someone can send millions of transactions containing a dust amount of value; costing the sender very little to spam the network. This first happened in earnest on July 2, 2011 when roughly 12,000 bitcoin transactions were processed containing only dust amounts. This continued on through the beginning of October 2011 when a revision was made to penalize and reject dust transactions on the network.
Originally bitcoin transactions did not require a fee and the network would readily accept transactions containing one millionths of a penny in value. As time went on changes were made to the protocol to make it cost prohibitive to send dust transactions.
The second big dust attack on the network was on April 28, 2013.
This graph shows how many new dust keys are generated daily.
Here is a graph showing the number of public keys containing dust amounts over time. As you can see it is steadily growing with the blockchain today containing 1.9 million of these keys. These are keys which are pretty much just polluting the blockchain with dust, noise, chaff, and serving no functional purpose.
There are some arguably legitimate uses of the bitcoin network which produce a lot of dust. Gambling games, tipping, and micro-transactions all introduce a lot of smaller transactions which are part of the daily traffic in the network. However, most of this kind of activity could be moved to side-chains or other mechanisms that don't have to pollute the main blockchain with so much noise.
Daily Bitcoin Value Change based on New Bitcoin Addresses versus Old
I thought this might be an interesting graphic to show how much of the value on the network flows through newly generated bitcoin addresses, versus value moving through pre-existing keys.
It has been a common refrain to the bitcoin community to always use a new key for every transaction. It appears people are taking heed.
I have two graphs, the first showing the distribution over the entire lifetime of the blockchain and the second for just 2014. You will see a steady trend of more and more value flowing through newly generated keys daily.
Distribution of Bitcoin Balances by Public Key Address
Reporting statistics about how bitcoin balance is distributed by public key has created untold confusion in the past. Too many people get an idea in their head that one public key somehow corresponds to one human being grubbily holding a paper wallet in their hand. However, things are rarely that simple. Most bitcoin wallets generate hundreds of public keys and now with HD wallets new keys are generated automatically for every single transaction. However, what is probably even more significant, are the number of keys generated by bots, scripts, exchanges, mixers, and any number of daily churn on the network by a wide array of activity that is far removed from one person = one public key address. Let me restate that, the vast majority of the transaction volume on the network is not comprised of individual people pointing their cellphones at a QR code to buy a pastry at 'Strange Donuts'. Most of the traffic is generated by computer systems, not people. There are numerous scripts and other software churning the network daily with transaction volume.
For these reasons I recommend extreme caution when it comes to trying to draw any conclusions about what this distribution means in larger terms as there are simply too many missing variables. All I can do is simply report how the value in the blockchain is distributed relative to the size of individual public keys.
To begin with, here is a list of every single public key with a balance of 50btc or more. There are about 61,600 as of December 31, 2014.
There are 351,812 bitcoin addresses which contain one bitcoin or more.
A general breakdown is as follows:
- Total Blocks: 336,860
- Total Transactions: 55,479,774
- Total Inputs: 137,314,100
- Total Outputs: 152,906,795
- Found 58,951,529 addresses which have ever been used.
- Found 54,995,083 addresses with a zero balance.
- Found 1,983,114 'dust' addresses (less than 1mbtc) with a total balance of 307.84464 BTC
- Found 1,621,919 addresses with a balance greater than 1mbtc but less than 1btc, total balance 141,767btc
- Found 231,774 addresses with a balance greater than 1btc but less than 10btc, total btc: 588,853
- Found 104,208 addresses with a balance greater than 10btc but less than 100btc, total: 3,633,244
- Found 13,889 addresses with a balance greater than 100btc but less than 1,000btc, total: 3,165,967
- Found 1,449 addresses with a balance greater than 1,000btc but less than 10,000btc, total: 3,215,649
- Found 90 addresses with a balance greater than 10,000btc but less than 100,000btc, total: 2,453,358
- Found 3 addresses with a balance greater than 100,000btc, total: 377,564
First let us focus on bitcoin public keys containing just one bitcoin or less. They comprise, in total, just about 140,000 bitcoins or only about 1% of all bitcoins in existence. This graph shows the breakdown of public keys containing values of various sizes, from less than one millibit, 5 millibits, etc. etc.
Finally, here is a graph which shows how many bitcoin addresses are being consumed by value of one bitcoin or less. As you can see 3.6 million bitcoin addresses have less than just one bitcoin in them and, of those, 1.9 million contain merely 'dust', a minuscule amount of bitcoin.
In other words, 1% of the value on the network accounts for something like 90% of all of the non-zero public keys. This is why trying to draw conclusions about 'number of keys' as somehow relating to 'number of wallets' is a ridiculous assertion.
Next is a graph showing the value distribution over time by size of the public key address since the beginning of the blockchain.
Now, to narrow in, here is what the value distribution looked like in 2014 as a percentage of the total; meaning what percentage of all bitcoins are held in addresses containing various amounts.
Messages in the Blockchain
Satoshi set an interesting precedent when he embedded a message in the very first block in the blockchain. His now infamous statement "The Times 03/Jan/2009 Chancellor on brink of second bailout for banks" is enshrined for all of history.
Since bitcoin transactions contain script code, it is possible for a person to embed an ASCII message into the blockchain. Most general purpose wallet software does not offer this is a feature for the end user, that is why it is rarely seen in most transactions. However, that doesn't prevent some from stamping their own personal "Killroy was here" throughout the blockchain. (An interesting note, 'killroy was here" has never shown up in the blockchain to date).
Most ASCII text in the blockchain is simply a message contained in a newly mined block transaction (called the coinbase transaction). Usually these messages simply identify the mining pool that created the block.
- Mined by BTC Guild
- EclipseMC: Aluminum Falcons?
- hi from poolserverj
Early in the lifetime of the blockchain one miner, Luke-Jr, would frequently inject Bible verses into his newly mined blocks. This really annoyed a bunch of people who talked about it on the bitcointalk forums. Ultimately there was really nothing that could be done about it other than embedding their own messages insulting him in return. To this date his religious teachings remain forever embedded in the blockchain.
I recently added a feature to my parsing tool to search for all ASCII text inside the blockchain and highlight it. The results can be found in the following text file that you can read at your leisure.
Link to all ASCII text in the blockchain: link
There are a few really strange and bizarre entries that people have embedded in the blockchain that are probably worth pointing out.
For example, recently someone has found it amusing to Rick Roll the blockchain by embedding the lyrics to Rick Astleys "Never Gonna Give You Up!". It can be found in about a dozen blocks starting with #268060 on May 11, 2013.
In block #276393 on December 22, 2013 a group calling themselves the IAmTimeLoop Party injected some truly bizarre text into the blockchain about sending messages to people in the future.
There is nothing preventing people from polluting the blockchain with messages and data but, if you think of the blockchain as a scarce resource, it does seem more like a bit of cyber-graffiti than anything else.
Signature Types Over Time
The vast majority of all bitcoin output scripts in the history of the blockchain use the single standard format that everyone is used to seeing by now; the number '1' followed a string of characters representing a bitcoin public pay to address. For example, here is my tip-jar address '1NY8SuaXfh8h5WHd4QnYwpgL1mNu9hHVBT' link. You are welcome to test by sending a small amount of bitcoin.
There are several new types of signatures that are worth noting. The first multi-sig address appeared in the blockchain on January 29, 2012. For whatever reason multi-sig has really not taken off and today there are only 37,138 of them in the blockchain which contain any value. It has only been used 42,145 times.
Pay To Script Hash
While the multi-sig feature never seems have been used that much, the same cannot be said for the 'pay to script hash' feature which was introduced on March 7, 2012.
The 'Pay to Script' hash is much more general purpose and can be used to implement both mutli-sig as well as many other features. Recently (very recently), it has become used a great deal. In fact, over 800,000 bitcoins have been transferred to 'P2S' addresses in just the past month alone! That is almost 6% of all bitcoins in existence!
Here is a graph showing the number of multi-sig and P2S addresses in the blockchain over time.
Largest bitcoin balances are now in 'Pay To Script' hash addesses
Quite recently, the top bitcoin address on the blockchain containing 134,170 bitcoins, switched to a P2S address. link
Here is a link to a spreadsheet which shows the top 60 or so Pay-To-Script addresses which now hold 721,301 bitcoins! link
Lately you may have started hearing about 'stealth' addresses. These are special addresses, which aren't technically really addresses at all. They are a special kind of meta-data that wallet software (such as DarkWallet) can use to hide any connectivity in the blockchain between a specific send and receive address. It is not yet certain how popular stealth addresses may become. They are non-standard and most of what they claim to accomplish can be done by simply generating a unique address for every single transaction you make.
Throughout the blockchain there are a number of output scripts which appear to have no valid 'pay-to' bitcoin address. Any bitcoins sent to these bogus output scripts are presumably lost forever. In some cases this could have happened on purpose. But most of the time it was likely the result of a bug in wallet or script code.
Here is an example of the first transaction I found in the blockchain with outputs that could not be decoded. link
The 45.82 bitcoins that were sent on the date on October 28, 2011, are presumably lost forever.
While I cannot be 100% sure of my totals, the number of bitcoins in the blockchain associated with unspendable outputs appears to be on the order of 2,600. So, while most of us would be thrilled to get our hands on 2,600 bitcoins, in the grand scheme of things it isn't that much out of the total.
Zombie Events for 2014
Last year about 78,000 bitcoins 'rose from the dead', meaning they sat in a bitcoin address that remained untouched for over three years before a spend transaction was performed against it. In contrast, in 2013 62,000 bitcoins rose from the dead.
To more easily find zombie events I created a formula I call the 'zombie score'. The 'zombie score' is computed as the number of bitcoins which are held at a zombie address times the number of days since the last previous transaction squared.
ZombieScore = TotalBitcoins ( DaysLastUsed DaysLastUsed )
With this formula, even if someone just moves 50 bitcoins, but they were from very early in the life of the blockchain then they will be given a higher visibility than a similar quantity which were not nearly as old.
To see a list of the major zombie events which occurred this past year I have collated them in the following small spreadsheet. This spreadsheet contains the public bitcoin address of each event if you want to dive into it deeper. link
The largest single zombie event in 2014 happened on April 2 when 10,000 bitcoins which had not been touched since March 18, 2011 decided to come to life.
Key: 1EyArywoLEhFto6uKWMxA9QKXJWLTNghsz link
The second largest zombie event snuck in right at the end of the year on December 30 when 5,000 bitcoins which had not been touched since Jun 11, 2011 woke up.
Key: 19XZS4EKsFEg2RL3sE5kUiLmUYRz4vWGxY link
On June 19th someone poked a stick at their zombie coins just to make sure they could be resurrected, since they hadn't touched them since August 2, 2010. On June 19th they performed an incredibly tiny test spend transaction against this key: 1KewyNAuJqeDiUDzrkNzKjmr5vrxfuWMa1 link. On June 19th they spent just 0.01 bitcoins of their 1,210 bitcoin stash. Since then that person has been doing a number of additional transactions and the key is quite active today.
Finally, what I find the most fascinating of all about zombie bitcoins, are those which were created in just the first couple of weeks of the history of bitcoin. These early mined blocks could have only been generated by a handful of people who were testing the software. I continue to be amazed that someone who mined bitcoins in the first couple of weeks of the history of the blockchain, could just sit on them, patiently, all of this time, without ever touching them in any way; not even to move them to a more secure wallet location.
On two days in 2014, October 14th and 18th, someone who mined bitcoins as early as January 29, 2009, finally decided it was time to move their stash. Here is a link to one of the transactions, you can find the rest in the spreadsheet provided above.
Key: 1KiLuqDytoMHu1KRzjh71TRbYraZHSP2xC link
The final section of this artcile will present data regarding the 'velocity' of value as it relates to the blockchain. This is a measure of how many bitcoins are associated with addresses which have been used over various periods of time.
About a month ago I presented a chart which, quite frankly, I feel was widely misinterpreted. My goal was to show that contrary to a popular belief that almost no bitcoins actually move on the network, since most people are simply 'holding', that in fact a great deal of bitcoins move on a daily, weekly, and monthly basis.
I'll let the data speak for itself here, what follows is a series of graphs at different levels of detail showing these trends.
We start with this classic exploded view pie chart. It is important to understand what this chart is trying to convey before we drill down into the data more deeply.
This chart shows all bitcoins in existence broken down by the 'age of last send transaction' associated with their bitcoin address. As previously discussed many of the bitcoins which were mined in the first couple of years of the lifetime of the blockchain have never been spent and that can be seen in the 'Four to Six years' slice of the pie; comprising 15% of all bitcoins in existence. As you review the following charts, you will note how this number never moves. The bulk of these bitcoins are presumed to belong to the creator of bitcoin, Satoshi Nakamoto, and no one knows for sure if they will ever been spent in the future.
As you examine this chart you will see that for the day it was created (December 31, 2014), on that date 4% of all bitcoins in existence were touched, 2% in the previous week, and 9% the previous month.
One way to think of this data is as the 'velocity' of value in bitcoin; meaning how fast does value move through the network over time. Sure, a large quantity of bitcoins are stuck, either belonging to Satoshi, lost forever, or in the hands of investors who have no intention of touching them under any circumstances. The remainder, however, we can see 'move' on the blockchain as spend transactions are performed over time.
Now, here is that same graph but rather than a pie chart, is it now a stacked area chart which has an entry for every single day over the lifetime of the blockchain.
Drilling in a little deeper, this chart zooms into just this past year of 2014 and shows in absolute value terms the velocity of bitcoins this past year with ages of six months or less.
For another view, instead of looking at absolute values we can express this in terms of percentages.
Here we can see that roughly 25% of all bitcoins are in motion for any three month period and between 5 and 10% in any given week.
Well, it turns out, that I have no strong conclusions to present here. My goal was to provide raw data and let the community decide for themselves how best to interpret it.
The fact that roughly 25% of all bitcoins change hands every few months seems like the sign of a positive and healthy economy to me. I've have heard others claim that it is a sign of something pathological, I'm not completely sure. But I doubt that 25% of any other 'store of value' commodities change hands that often.
What I found most interesting in this latest data are the massive number of bitcoins moving to 'pay-to-script' addresses (800,000 in total). I don't know necessarily what that 'means', but it is certainly a new development.
I do feel that the fact that so many new addresses are being generated daily, likely a sign of heavy use of HD wallets, mixers, stealth addresses, and other technologies to obscure the chain of ownership as value is transferred, that it is going to become increasingly difficult for people who are trying to analyze the blockchain in the future. None of my analysis to date has really concerned itself with the chain of ownership, I simply collect how that value is distributed at any given point in time and graph it.
Once again, if you want to validate that the code which produced this data is correct, you are free to download the open source tool located here:
Blockchain Parser link
All of the data used to produce these graphs is made available here in raw form if you want to confirm it, or make graphs of your own. You are free to use these graphs in your own articles, I simply ask that you keep my tip-jar QR code in the corner intact. If you find the data useful (and let me be clear this was a lot of work to pull together) tips, as always, are appreciated.
Top Pay-To-Script Addresses link
All 60,000+ bitcoin addresses containing a balance of 50btc or more: link
A report of all zombie events in history (this is any time someone does a send transaction on an address which had previously remained untouched for over three years!) link
A report of how new keys versus old keys are showing up in the blockchain daily. link
General statistics of the blockchain relative to size and age of addresses. Used to create many of these graphs. link
Link to all ASCII text in the blockchain: link
Top Zombie Events of 2014 : link