A Gentle Introduction to Blockchain Technology

by antonylewis2015 · September 9, 2015

This article is a gentle introduction to blockchain technology and assumes minimal technical knowledge. It attempts to describe what it is rather than why should I care, which is described in other posts.

Shorter companion pieces to this are:

Update: I have recently published a book, The Basics of Bitcoins and Blockchains which contains an updated version of this blog post and much, much more.

The Basics is an essential guide for anyone who needs to learn about cryptocurrencies, ICOs, and business blockchains. Written in plain English, it provides a balanced and hype-free grounding in the essential concepts behind the revolutionary technology. You can pre-order “The Basics of Bitcoins and Blockchains” now on Amazon.com.

PART 1 – EXECUTIVE SUMMARY

People use the term ‘blockchain technology’ to mean different things, and it can be confusing. Sometimes they are talking about The Bitcoin Blockchain, sometimes it’s The Ethereum Blockchain, sometimes it’s other virtual currencies or digital tokens, sometimes it’s smart contracts. Most of the time though, they are talking about distributed ledgers, i.e. a list of transactions that is replicated across a number of computers, rather than being stored on a central server.

The common themes seem to be a data store which:

usually contains financial transactions
is replicated across a number of systems in almost real-time
usually exists over a peer-to-peer network
uses cryptography and digital signatures to prove identity, authenticity and enforce read/write access rights
can be written by certain participants
can be read by certain participants, maybe a wider audience, and
has mechanisms to make it hard to change historical records, or at least make it easy to detect when someone is trying to do so

I see “blockchain technology” as a collection of technologies, a bit like a bag of Lego. From the bag, you can take out different bricks and put them together in different ways to create different results.

blockchain_bag2 — I see blockchain technology as a bag of Lego or bricks

What’s the difference between a blockchain a a normal database? Very loosely, a blockchain system is a package which contains a normal database plus some software that adds new rows, validates that new rows conform to pre-agreed rules, and listens and broadcasts new rows to its peers across a network, ensuring that all peers have the same data in their databases.

PART 2 – INTRODUCING BITCOIN’S BLOCKCHAIN

The Bitcoin Blockchain ecosystem

As a primer on bitcoin, it may help to review A gentle introduction to bitcoin.

The Bitcoin Blockchain ecosystem is actually quite a complex system due to its dual aims: that anyone should be able to write to The Bitcoin Blockchain; and that there shouldn’t be any centralised power or control. Relax these, and you don’t need many of the convoluted mechanisms of Bitcoin.

That said, let’s start with The Bitcoin Blockchain ecosystem, and then try to tease out the blockchain bit from the bitcoin bit.

Replicated databases. The Bitcoin Blockchain ecosystem acts like a network of replicated databases, each containing the same list of past bitcoin transactions. Important members of the network are called validators or nodes which pass around transaction data (payments) and block data (additions to the ledger). Each validator independently checks the payment and block data being passed around. There are rules in place to make the network operate as intended.

Bitcoin’s complexity comes from its ideology. The aim of bitcoin was to be decentralised, i.e. not have a point of control, and to be relatively anonymous. This has influenced how bitcoin has developed. Not all blockchain ecosystems need to have the same mechanisms, especially if participants can be identified and trusted to behave.

Here’s how bitcoin approaches some of the decisions:

Public vs private blockchains

There is a big difference in what technologies you need, depending on whether you allow anyone to write to your blockchain, or known, vetted participants. Bitcoin in theory allows anyone to write to its ledger (but in practice, only about 20 people/groups actually do).

Public blockchains. Ledgers can be ‘public’ in two senses:

Anyone, without permission granted by another authority, can write data
Anyone, without permission granted by another authority, can read data

Usually, when people talk about public blockchains, they mean anyone-can-write.

Because bitcoin is designed as a ‘anyone-can-write’ blockchain, where participants aren’t vetted and can add to the ledger without needing approval, it needs ways of arbitrating discrepancies (there is no ‘boss’ to decide), and defence mechanisms against attacks (anyone can misbehave with relative impunity, if there is a financial incentive to do so). These create cost and complexity to running this blockchain.

Private blockchains. Conversely, a ‘private’ blockchain network is where the participants are known and trusted: for example, an industry group, or a group of companies owned by an umbrella company. Many of the mechanisms aren’t needed – or rather they are replaced with legal contracts – “You’ll behave because you’ve signed this piece of paper.”. This changes the technical decisions as to which bricks are used to build the solution.

Another way of describing public/private might be permissionless vs permissioned or pseudonymous vs identified participants.

See the pros and cons of internal blockchains or the difference between a distributed ledger and a blockchain for more on this topic.

PART 3 – MORE DEPTH, PLEASE

Warning: this section isn’t so gentle, as it goes into detail into each of the elements above. I recommend getting a cup of tea.

DATA STORAGE: What is a blockchain?

A blockchain is just a file. A blockchain by itself is just a data structure. That is, how data is logically put together and stored. Other data structures are databases (rows, columns, tables), text files, comma separated values (csv), images, lists, and so on. You can think of a blockchain competing most closely with a database.

Blocks in a chain = pages in a book
For analogy, a book is a chain of pages. Each page in a book contains:

the text: for example the story
information about itself: at the top of the page there is usually the title of the book and sometimes the chapter number or title; at the bottom is usually the page number which tells you where you are in the book. This ‘data about data’ is called meta-data.

Similarly in a blockchain block, each block has:

the contents of the block, for example in bitcoin is it the bitcoin transactions, and the miner incentive reward (currently 25 BTC).
a ‘header’ which contains the data about the block. In bitcoin, the header includes some technical information about the block, a reference to the previous block, and a fingerprint (hash) of the data contained in this block, among other things. This hash is important for ordering.

books_and_blocks — Blocks in a chain refer to previous blocks, like page numbers in a book.

See this infographic for a visualisation of the data in Bitcoin’s blockchain.

Block ordering in a blockchain

Page by page. With books, predictable page numbers make it easy to know the order of the pages. If you ripped out all the pages and shuffled them, it would be easy to put them back into the correct order where the story makes sense.

Block by block. With blockchains, each block references the previous block, not by ‘block number’, but by the block’s fingerprint, which is cleverer than a page number because the fingerprint itself is determined by the contents of the block.

ordering — The reference to previous blocks creates a chain of blocks – a blockchain!

Internal consistency. By using a fingerprint instead of a timestamp or a numerical sequence, you also get a nice way of validating the data. In any blockchain, you can generate the block fingerprints yourself by using some algorithms. If the fingerprints are consistent with the data, and the fingerprints join up in a chain, then you can be sure that the blockchain is internally consistent. If anyone wants to meddle with any of the data, they have to regenerate all the fingerprints from that point forwards and the blockchain will look different.

one_block — A peek inside a blockchain block: the fingerprints are unique to the block’s contents.

This means that if it is difficult or slow to create this fingerprint (see the “making it hard for baddies to be bad” section), then it can also be difficult or slow to re-write a blockchain.

The logic in bitcoin is:

Make it hard to generate a fingerprint that satisfies the rules of The Bitcoin Blockchain
Therefore, if someone wants to re-write parts of The Bitcoin Blockchain, it will take them a long time, and they have to catchup with and overtake the rest of the honest network

This is why people say The Bitcoin Blockchain is immutable (can not be changed)*.

* Here’s a piece on immutability in blockchains.

DATA DISTRIBUTION: How is new data communicated?

Peer to peer is one way of distributing data in a network. Another way is client-server. You may have heard of peer-to-peer file sharing on the BitTorrent network where files are shared between users, without a central server controlling the data. This is why BitTorrent has remained resilient as a network: there is no central server to shut down.

Client-server
In the office environment, often data is held on servers, and wherever you log in, you can access the data. The server holds 100% of the data, and the clients trust that the data is definitive. Most of the internet is client-server where the website is held on the server, and you are the client when you access it. This is very efficient, and a traditional model in computing.

Peer-to-peer
In peer-to-peer models, it’s more like a gossip network where each peer has 100% of the data (or as close to it as possible), and updates are shared around. Peer-to-peer is in some ways less efficient than client-server, as data is replicated many times; once per machine, and each change or addition to the data creates a lot of noisy gossip. However each peer is more independent, and can continue operating to some extent if it loses connectivity to the rest of the network. Also peer-to-peer networks are more robust, as there is no central server that can be controlled, so closing down peer-to-peer networks is harder.

The problems with peer-to-peer
With peer-to-peer models, even if all peers are ‘trusted’, there can be a problem of agreement or consensus – if each peer is updating at different speeds and have slightly different states, how do you determine the “real” or “true” state of the data?

Worse, in an ‘untrusted’ peer-to-peer network where you can’t necessarily trust any of the peers, how do you ensure that the system can’t easily be corrupted by bad peers?

CONSENSUS: How do you resolve conflicts?

A common conflict is when multiple miners create blocks at roughly the same time. Because blocks take time to be shared across the network, which one should count as the legit block?

Example. Let’s say all the nodes on the network have synchronised their blockchains, and they are all on block number 80.
If three miners across the world create ‘Block 81’ at roughly the same time, which ‘Block 81’ should be considered valid? Remember that each ‘Block 81’ will look slightly different: They will certainly contain a different payment address for the 25 BTC block reward; and they may contain a different set transactions. Let’s call them 81a, 81b, 81c.

three_blocks — Which block should count as the legit one?

How do you resolve this?

Longest chain rule. In bitcoin, the conflict is resolved by a rule called the “longest chain rule”.

In the example above, you would assume that the first ‘Block 81’ you see is valid. Let’s say you see 81a first. You can start building the next block on that, trying to create 82a:

mine_on_first — Treat the first block you see as legitimate.

However in a few seconds you may see 81b. If you see this, you keep an eye on it. If later you see 82b, the “longest chain rule” says that you should regard the longer ‘b’ chain as the valid one (…80, 81b, 82b) and ignore the shorter chain (…80, 81a). So you stop trying to make 82a and instead start trying to make 83b:

mine_on_longest — Longest chain rule: If you see multiple blocks, treat the longest chain as legitimate.

The “longest chain rule” is the rule that the bitcoin blockchain ecosystem uses to resolve these conflicts which are common in distributed networks.

However, with a more centralised or trusted blockchain network, you can make decisions by using a trusted, or senior validator to arbitrate in these cases.

See a gentle introduction to bitcoin mining for more detail.

UPGRADES: How do you change the rules?

As a network as a whole, you must agree up front what kind of data is valid to be passed around, and what is not. With bitcoin, there are technical rules for transactions (Have you filled in all the required data fields? Is it in the right format? etc), and there are business rules (Are you trying to spend more bitcoins than you have? Are you trying to spend the same bitcoins twice?).

Rules change. As these rules evolve over time, how will the network participants agree on the changes? Will there be a situation where half the network thinks one transaction is valid, and the other half doesn’t think so because of differences in logic?

In a private, controlled network where someone has control over upgrades, this is an easy problem to solve: “Everyone must upgrade to the new logic by 31 July”.

However in a public, uncontrolled network, it’s a more challenging problem.

With bitcoin, there are two parts to upgrades.

Suggest the change (BIPs). First, there is the proposal stage where improvements are proposed, discussed, and written up. A proposal is referred to as a “BIP” – a “Bitcoin Improvement Proposal”.
If it gets written into the Bitcoin core software on Github, it can then form part of an upgrade – the next version of “Bitcoin core” which is the most common “reference implementation” of the protocol.
Adopt the change (miners). The upgrade can be downloaded by nodes and block makers (miners) and run, but only if they want to (you could imagine a change which reduces the mining reward from 25 BTC per block to 0 BTC. We’ll see just how many miners choose to run that!).

If the majority of the network (in bitcoin, the majority is determined by computational power) choose to run a new version of the software, then new-style blocks will be created faster than the minority, and the minority will be forced to switch or become irrelevant in a “blockchain fork”. So miners with lots of computational power have a good deal of “say” as to what gets implemented.

WRITE ACCESS: How do you control who can write data?

In the bitcoin network, theoretically anyone can download or write some software and start validating transactions and creating blocks. Simply go to https://bitcoin.org/en/download and run the “Bitcoin core” software.

Your computer will act as a full node which means:

Connecting to the bitcoin network
Downloading the blockchain
Storing the blockchain
Listening for transactions
Validating transactions
Passing on valid transactions
Listening for blocks
Validating blocks
Passing on valid blocks
Creating blocks
‘Mining’ the blocks

The source code to this “Bitcoin core” software is published on Github: https://github.com/bitcoin/bitcoin. If you are so inclined, you can check the code and compile and run it yourself instead of downloading the prepackaged software on bitcoin.org. Or you can even write your own code, so long as it conforms to protocol.

Ethereum works in a similar way in this respect – see a gentle introduction to Ethereum.

Permissionless
Note that you don’t need to sign up, log in, or apply to join the network. You can just go ahead and join in. Compare this with the SWIFT network, where you can’t just download some software and start listening to SWIFT messages. In this way, some call bitcoin ‘permissionless’ vs SWIFT which would be ‘permissioned’.

Permissionless is not the only way
You may want to use blockchain technology in a trusted, private network. You may not want to publish all the rules of what a valid transaction or block looks like. You may want to control how the network rules are changed. It is easier to control a trusted private network than an untrusted, public free-for-all like bitcoin.

DEFENCE: How do you make it hard for baddies?

A problem with a permissionless, or open networks is that they can be attacked by anyone. So there needs to be a way of making the network-as-a-whole trustworthy, even if specific actors aren’t.

What can and can’t miscreants do?

A dishonest miner can:

Refuse to relay valid transactions to other nodes
Attempt to create blocks that include or exclude specific transactions of his choosing
Attempt to create a ‘longer chain’ of blocks that make previously accepted blocks become ‘orphans’ and not part of the main chain

He can’t:

Create bitcoins out of thin air*
Steal bitcoins from your account
Make payments on your behalf or pretend to be you

That’s a relief.

*Well, he can, but only his version of the ledger will have this transactions. Other nodes will reject this, which is why it is important to confirm a transaction across a number of nodes.

With transactions, the effect a dishonest miner can have is very limited. If the rest of the network is honest, they will reject any invalid transactions coming from him, and they will hear about valid transactions from other honest nodes, even if he is refusing to pass them on.

With blocks, if the miscreant has sufficient block creation power (and this is what it all hinges on), he can delay your transaction by refusing to include it in his blocks. However, your transaction will still be known by other honest nodes as an ‘unconfirmed transaction’, and they will include it in their blocks.

Worse though, is if the miscreant can create a longer chain of blocks than the rest of the network, and invoking the “longest chain rule” to kick out the shorter chains. This lets him unwind a transaction.

Here’s how you can do it:

Create two payments with the same bitcoins: one to an online retailer, the other to yourself (another address you control)
Only broadcast the payment that pays the retailer
When the payment gets added in an honest block, the retailer sends you goods
Secretly create a longer chain of blocks which excludes the payment to the retailer, and includes the payment to yourself
Publish the longer chain. If the other nodes are playing by the “longest chain rule” rule, then they will ignore the honest block with the retailer payment, and continue to build on your longer chain. The honest block is said to be ‘orphaned’ and does not exist to all intents and purposes.
The original payment to the retailer will be deemed invalid by the honest nodes because those bitcoins have already been spent (in your longer chain)

This is called a “double spend” because the same bitcoins were spent twice – but the second one was the one that became part of the eventual blockchain, and the first one eventually gets rejected.
How do you make it hard for dishonest miners to create blocks?

Remember, this is only a problem for ledgers where block-makers aren’t trusted.

Essentially you want to make it hard, or expensive for baddies to add blocks. In bitcoin, this is done by making it computationally expensive to add blocks. Computationally expensive means “takes a lot of computer processing power” and translates to financially expensive (as computers need to be bought then run and maintained).

The computation itself is a guessing game where block-makers need to guess a number, which when crunched with the rest of the block data contents, results in a hash / fingerprint that is smaller than a certain number. That number is related to the ‘difficulty’ of mining which is related to the total network processing power. The more computers joining in to process blocks, the harder it gets, in a self-regulating cycle.

difficulty_calibration — Every 2,016 blocks (roughly every 2 weeks), the bitcoin network adjusts the difficulty of the guessing game based on the speed that the blocks have been created.

This guessing game is called “Proof of work”. By publishing the block with the fingerprint that is smaller than the target number, you are proving that you did enough guess work to satisfy the network at that point in time.

INCENTIVES: How do you pay validators?

Transaction and block validation is cheap and fast, unless you choose to make it slow and expensive (a la bitcoin).

If you control the validators in your own network, or they are trusted, then

you don’t need to make it expensive to add blocks, and
therefore you can reduce the need to incentivise them

You can use other methods such as “We’ll pay people to run validators” or “People sign a contract to run validators and behave”.

Because of bitcoin’s ‘public’ structure, it needs a defence against miscreants and so uses “proof of work” to make it computationally difficult to add a block (see Defence section). This has created a cost (equipment and running costs) of mining and therefore a need for incentivisation.

Just as the price of gold determines how much equipment you can spend on a gold mine, bitcoin’s price determines how much mining power is used to secure the network. The higher the price, the more mining there is, and the more a miscreant has to spend to bully the network.

So, miners do lots of mining, increasing the difficulty and raising the walls against network attacks. They are rewarded in bitcoin according to a schedule, and in time, as the block rewards reduce, transaction fees become the incentive that miners collect.

tx_fees_replace_block_rewards — The idealised situation in Bitcoin where block rewards are replaced by transaction fees.

This is all very well in theory, but the more you look into this, the more interesting it gets, and with the bitcoin solution, the incentives may not quite have worked as expected. This is something for another article…

CONCLUSION

It is useful to understand blockchains in the context of bitcoin, but you should not assume that all blockchain ecosystems need bitcoin mechanisms such as tokens, proof of work mining, longest chain rule, etc. Bitcoin is the first attempt at maintaining a decentralised, public ledger with no formal control or governance. Ethereum is the next iteration of a blockchain with smart contracts. There are significant challenges involved.

On the other hand, private or internal distributed ledgers and blockchains can be deployed to solve other sets of problems. As ever, there are tradeoffs and pros and cons to each solution, and you need to consider these individually for each individual use case.

If you have a specific business problem which you think may be solvable with a blockchain, I would love to hear about this: please contact me.

Acknowledgments

With thanks to David Moskowitz, Tim Swanson, Roberto Capodieci. Errors, omissions, and simplifications are mine.

Tags: bitcoin block rewards consensus cryptography digital signatures distributed ledgers immutable incentives longest chain rule p2p participation proof of work

Marco Morana says:

September 12, 2015 at 9:17 pm

Very nice article. I like the “other methods” column. I personally think these other methods would need a in-depth analysis by a security expert expecially as the proposed trusted models do not specify how authorization will be implemented in the closed trust model vs. current open trust model used in bitcoin blockchain. Also security controls to protect the new blockchain need to be derived based upon a threat model to identify the attacks and design the controls specifically for securing a client to server blockchain architecture vs. current peer to peer as implemented in bitcoin blockchain

Loading...

Reply
- antonylewis2015 says:
  
  September 12, 2015 at 9:47 pm
  
  Thanks Marco, I agree there is a lot to think about: no doubt the industry will get things wrong during the teething period, but with a lot of the parts of the system using cryptographic signatures, I am confident we will have more secure and transparent models in the future than we currently do.
  
  Loading...
  
  Reply
sohanagate says:

March 2, 2016 at 10:23 pm

I think your article is just excellent. I am citing it in my Masters thesis.

Loading...

Reply
- antonylewis2015 says:
  
  March 3, 2016 at 8:24 am
  
  Thank you! I’m glad it’s helpful 🙂
  
  Loading...
  
  Reply
- tamim2996 says:
  
  December 2, 2016 at 4:37 pm
  
  Hello Sohana I am Tamim. I am doing master thesis on the same topic. It will be very nice to me if you kindly share some ideas with me. Here is my email address tamim2996@gmail.com
  
  Loading...
  
  Reply
Florent Morin says:

May 3, 2016 at 12:28 am

Reblogged this on Morin Innovation and commented:
Article très intéressant pour comprendre la technologie blockchain

Loading...

Reply
Sharan says:

May 17, 2016 at 2:33 am

Thanks. Simple and easy to follow introduction of blockchain.

Loading...

Reply
Griyabayar | PPOB terbaik di Indonesia adalah PPOB BTN says:

June 16, 2016 at 1:15 am

Thanks for finally writing about >A gentle introduction to blockchain technology |
Bits on blocks <Liked it!

Loading...

Reply
Fredrick Roswold says:

September 3, 2016 at 6:53 am

Thanks for writing this article; I am starting to “get it” about block chain, after hearing the term and reading cryptic characterizations, which were equally baffling in themselves, for years. I’d guess most people have had no idea, sort of like me. Still loads and loads of questions in my mind (like who creates a bit coin? How? Isn’t the requirement to keep the whole history of all blocks and all bit coins on all the computers of every network node rather overkill just to rid ourselves of the central authority? Seems inefficient. What happens when most of these network nodes get bored with it and switch off their computers? What is the vision of the end of this thing? And other questions) but I’ll keep digging and I know I’ll reach a level of understanding which satisfies my curiosity.

Loading...

Reply
Madhava says:

September 3, 2016 at 4:01 pm

The article is awesome and complex Blockchain is explained in simple and easy way. Would you please share the technologies to implement it, is it ethereum + solidity are enough to develop a Blockchain application.

Loading...

Reply
Dik Langan says:

September 8, 2016 at 11:40 pm

Hi, so how do the miners (or the software they use to build the blocks) verify that you are spending bitcoins you have? To do this they must know how many coins before the transaction which suggests they have to look at the whole chain from the beginning of time (well, the beginning of BitCoin) and look for every transaction relating to your “account” which must take huge amounts of storage (to store the whole BitCoin chain) and processing power (to either cycle through every single block every time or alternatively keep a record of how many BitCoins every single participant has at any moment in time. The only other alternative that I can imagine is that each transaction includes a start and end balance for both participants, then all you have to do is go back through the chain until you find a previously verified transaction for each participant and validate the start balance for this new transaction matches the end balance for the previous one for each person. I can see that might be the solution as each verified block is being trusted so you don’t need to find anything before the previous transaction for each. Sorry as I think I have answered my own question, but only in theory, it ould be nice if you could confirm it for me 🙂 Thanks for a great article and glad it appears to have been written by a fellow Brit who appreciates the benefit a cup of tea has to a reader’s understanding.

Loading...

Reply
rupirupi says:

October 19, 2016 at 9:04 pm

best description and explanation so far 🙂

Loading...

Reply
tamim2996 says:

December 2, 2016 at 9:23 pm

What happen nwith the blocks those are discarded using “longest chain rule”? Sender has to make the transaction again? Kindly explain.

Loading...

Reply
- antonylewis2015 says:
  
  December 16, 2016 at 7:17 am
  
  Your transaction may have found its way into the included block, or if not, it may get included into future blocks. If your transaction was valid in the first place, and the accepted block doesn’t do anything to invalidate your transaction (like by including a transaction that spends those coins elsewhere), thenyour transaction will still be included in a future block, so it’s more like a delay in processing.
  
  Loading...
  
  Reply
Nick M. says:

December 15, 2016 at 6:02 am

Very informative article but I personally would have found it more beneficial if the author did not go back and forth between blockchain and Bitcoin.

Loading...

Reply
Amgad Wagdy says:

February 10, 2018 at 7:15 pm

this is definitely the best description for Blockchain for beginners, thank you

Loading...

Reply
Shekhar Sahu says:

August 16, 2018 at 4:11 pm

This is by far the most comprehensive and simplest article I read about block chain technology.

Loading...

Reply
Robert Bint says:

October 3, 2018 at 4:45 pm

Hi,

That’s a great post!
Very well researched and briefed information about blockchain. Its great to see such post which are very helpful for people who have started learning this technology.

Thanks for sharing

Loading...

Reply

A Gentle Introduction to Blockchain Technology

PART 1 – EXECUTIVE SUMMARY