So you want to use a blockchain for that?
There are good reasons and bad reasons to use blockchains. In conversations with people thinking about blockchain use cases, I have noticed common confusions and conflations arising from words initially used in a narrow context (usually to describe bitcoin’s blockchain) being understood more generically for blockchains. In this post I hope to untangle some of these common misconceptions.
Theme: Blockchains are secure
Bitcoin has some specific security for writing data due to the burden of proof-of-work. That is, in order to add blocks of transactions, you have to validate all the transactions within the block (easy) and then perform repeated calculations (called hashing) to find a magic number that makes your block valid and acceptable to the other participants according to the rules of the network (easy, but computationally expensive, therefore energy intensive, therefore expensive). This proof-of-work burden combined with the longest chain rule makes it expensive to mine your own subversive chain.
Private chains on the other hand, with known block-adders, may have other mechanisms replacing proof-of-work that limit the ability to subvert the chain. These rules can specify that blocks need to be signed by a limited, known list of signatories, in some round-robin fashion. The knowledge of which entity signed which block, with rules in place to make entities take it in turns to write blocks is enough to discourage or limit unilateral bad behaviour.
Bitcoin, and blockchains, do not have inherent security against read access. Indeed, blockchains are mechanisms for copying data to all relevant participants – this is what consensus is all about.
If you think you have cybersecurity headaches controlling read-access to one central database, then multiply that by the number of nodes in your blockchain to get the new attack surface area of your blockchain.
You can control read-access to some degree by encrypting certain elements on your blockchain and handing out the keys to the relevant participant. But consider the threat of industrial espionage where keys are sold to a rival organisation who also runs a node – now the rival can read your data without even penetrating your system, because the blockchain is copying the data right into his data centre! There may be solutions here involving key-rotation, but historical data also needs considering. The value of the third party is that they can control access to the data more finely. They also provide a single entity to litigate against if they expose private data or they breach their contractual obligations.
Denial of service
Blockchains are more resilient than centralised systems against denial-of-service attacks, due to their peer to peer, multi-redundant nature. If one node is taken offline, the others keep working. Users connected to the disabled node will be unable to connect, unless there is a mechanism in place for them to try some other nodes to fall back to.
Theme: Blockchains are encrypted
There can be confusion between the cryptographic methods used in bitcoin (hashing, digital signatures) and data on blockchains being encrypted (data stored as cyphertext). This can lead to people thinking that data on a blockchain is by default encrypted.
In fact, data on blockchains is by default not encrypted, especially data that needs to be validated by the nodes. In bitcoin, transaction data is not encrypted, as you can see by looking at any transaction in bitcoin’s blockchain. For a deeper look explanation of the specific elements in a bitcoin transaction see here.
The most apparent problem with encrypting data on a blockchain is primarily that the encrypted data can’t be validated, because nodes need to know what they are validating. For example, if I am validating the legitimacy of your payment of 2 BTC from your wallet, I need to know the contents of your wallet (ie your previous inbound transactions) and the fact that you are trying to spend 2 BTC (and which ones).
In a private chain, if all validating nodes can decrypt your data by having decryption keys, then you need to consider why you are encrypting it in the first place.
There are solutions emerging from primary cryptographic research for the ability to prove facts about data without knowing the underlying data itself, known as zero-knowledge proofs, but this technology is not currently mature.
If privacy is important, then consider what needs to be encrypted: All data at rest? Data in motion? The whole database? Data within specific database fields? And who will be able to decrypt it and when? How will permissions be granted? Can permissions be revoked? What happens if a third party gets a decryption key through a rogue staff member? What happens if a legitimate user loses a decryption key?
Key management is a crucial part of data security – even more so when the data is freely shared between (usually) competitors in an industry, and needs to be carefully considered in a blockchain solution.
Theme: Using a blockchain allows better access to data
Many existing centralised solutions already do an excellent job of allowing access to data, with carefully controlled read and write access, and also a layer of accountability on the central owner of the data who can react to either moral imperatives or legal directives. For example Facebook is quite accessible globally, can take down hate speech or copyrighted material.
Blockchains can make access control to data more complex, and immutability is not without its downsides. In many potential use cases being examined, nodes are run by a separate entities or groups (if they’re not, then consider why you’re using a blockchain in the first place), and each entity controls and manages its own access control to the data. There may be challenges around managing access control across all entities that have a copy of the blockchain data.
Theme: This blockchain allows end users to do [x] peer-to-peer without a middle man
This narrative seems to have come from bitcoin’s whitepaper which describes the purpose of bitcoin to allow people to send digital cash from person to person without a specific financial intermediary. If you count the miner adding the block as an intermediary who collects fees and rewards for his work, then there are intermediaries in bitcoin, but the point is that they are not specific (one miner can substitute for another), and you are not beholden to a specific miner for your transactions to work or not.
For many private blockchains currently being described in industry, there are middle men – these are the participants running the nodes, or the technology vendors clipping tickets to monetise their blockchain solutions.
Theme: Users will run their own blockchains on their phones
I have occasionally heard ideas where users need to store blockchain data on their phones (especially for use cases where users should own their own data). Beware the mobile phone blockchain, as it implies that the phone will be constantly chatting to the rest of the network, downloading and uploading other people’s data non-stop to remain in consensus.
Theme: The blockchain will be an immutable record of all events
In bitcoin, where old transactions need to be tracked in order to figure out the validity of new transactions, this is the case. It is also the case that a bitcoin transaction only “happens” or settles if it is broadcast to the bitcoin network and is accepted into a block. Each event in bitcoin is a necessary event to build up the picture of the state of the ledger.
This does not mean that if you throw a blockchain at a random problem, you will immediately accurately capture every single event. Events need to be input by someone or something and then broadcast and accepted for them to be recorded. Data on a blockchain doesn’t imply accuracy – events need to be recorded accurately in the first place. This is even more important when the record may be immutable.
Theme: Because it’s on a blockchain, it’s true
This is a confusion around use of the word “true”. In bitcoin “true” means that the network has agreed that a transaction has taken place, and nodes are in agreement or consensus that this has happened.
The concept of “truth” as applied to blockchains doesn’t extend to other meanings of “true”. If a heart-monitoring piece of hardware becomes faulty and records incorrect heart-rate readings onto a blockchain, do the readings become truth? Clearly not.
On a registry of car ownership, a blockchain may immutably record that a car has changed owner. If this transaction was made in error or fraudulently due to a hacking of the owner’s phone, what is the state of the truth? If the transaction was found to be fraudulent by the police and needs to be ‘unwound’, then how will that be done, given the cryptographic security of digital signatures? (there are solutions, but they just need to be thought through)
In the case of blockchains, truth just means “what was originally recorded and agreed as valid by the majority of the nodes”. Valid doesn’t necessarily mean true. Don’t confuse blockchain truth with The Truth. For a trivial but concrete example of an immutable lie on multiple levels, see here.
Theme: Data stored on a blockchain
This is prevalent in the blockchains-for-KYC and blockchains-for-document-storage space.
Comments such as “This is stored on the blockchain” can cause confusion when a hash of a document (pdf, jpeg etc) is published to a blockchain. A hash is not an encrypted version of an original; and when a hash is stored, you can’t retrieve the original by decrypting the hash. The hash of a fingerprint of the data, and if it is stored on a blockchain, someone who has kept an exact copy of that data (off chain) can prove that that specific data existed at the timestamp when the hash was stored on the blockchain.
While you can store whole documents on blockchains (after all a blockchain is just a database coupled with software that validates and shares new entries to the other participants), passing large chunks of data around, at speed can create its own set of problems.
Theme: Participants to a blockchain
There can be confusion when the word “participants” is used. Generally speaking there are three main types of participants to blockchains:
- Participants who write blocks (in bitcoin these are called miners and they crunch numbers)
- Participants who maintain the entire blockchain and validate and propagate new entries (in bitcoin these are generally called full nodes)
- Participants who are the end users of the functionality of the blockchain, usually accessing the blockchain by connecting to a full node (in bitcoin these are generally called users)
It may be best to always spell out exactly which participants are being referred to.
Theme: Well what should I use a blockchain for?
Blockchains are great when multiple parties need to read the same information but for whatever reason there can’t be or shouldn’t be any specific individual party in control of that data. Gideon Greenspan has written a great article about avoiding the pointless blockchain project here, and then describes some genuine use cases here.
Theme: If I use the word ‘blockchain’ I can get budget
Go for it! The only way the technology will improve is by people trying it and adapting it to fit problems better. Try to understand and be aware of the limitations and complexities early and be careful about over-fitting a trendy technical solution to a problem.
Nice post. Should be useful explaining things to people we do business with.
Hey, you said, ‘…passing large chunks of data around, at speed can create its own set of problems.’ What kind of problems are those? Would they be inherent problems in the Blockchain, such as speed and needing large storage memory to store the files?
Yes, lots of transactions containing large files, needing bandwidth that scales with the square of the number of nodes, also needing lots of storage.