In this article, we explain the key notions and principles to understand how a blockchain works, such as the concepts of hashing and mining, and highlight its main properties. We also discuss about the reasons why blockchain is such a disruptive technology and how it can impact certain fields of application such finance and agriculture.
Although this article contains some technical details, we do not show how to actually implement a blockchain. However, if you are interested in learning how to actually develop a blockchain system, you can read this tutorial (using Anaconda Notebook), where we implement a block prototype from scratch. Note that this tutorial is only for educational purposes and does not attempt to create a full-fledged blockchain software.
Why do we need blockchain?
When we vote, what does guarantee that our vote is actually taken into account? When we buy coffee labeled “fair trade”, what makes us certain about its origin? When we buy something on the Internet how do we know that our payment is correctly accounted? Or when we read an article such as this one, what make us so sure about the veracity of its content?
To be really sure about the authenticity of any piece of information, such as media, documents or transactions, we usually put our trust into a third party like an institution, a company or a government that will hopefully be in charge of carrying out the verification process. The problem of this trust-based approach is that the third party often needs to put its trust into another third party as well, thus spreading the need for trust. This can considerably increase the number of intermediaries that needs to be trust and eventually to be paid.
To avoid trusting third parties, we need a system where all records are public, can be stored and verify by everyone, and where security is guarantee so that nobody can cheat the system by tampering or editing any registered record. Systems like this are now emerging thanks to a technology called blockchain.
What is a blockchain?
As its name suggests, blockchain is a chain of blocks, where each block contains data in the form of a group of records, transactions or items. Blocks have a timestamp and are ordered in chronological order forming a chain, where the oldest record is at the tail and the most recent record is at the head of the chain. These blocks are also connected to each other through their so-called “hashes”, where each block contains the hash of its preceding block.
In a blockchain system, all data are public and everybody can access or add information into the system. However, nobody can modify or delete any data that is already present in the system. This means that once a block of records is added to the chain, it cannot be modified in any way. This is achieved thanks to the use of “hashes” along the chain, which prevents attackers to arbitrarily change data already stored in the system.
A blockchain stores data across a network of personal computers distributed around the world. Everyone can run the system using their own personal device without the need of a central authority. In theory, this means that nobody (not even a government or a company) owns the system, yet everyone can use it.
To sum up, a blockchain is like a chronological, irreversible, publicly-available online database, where everybody can can read and add new information but not change any information that is already present into the system. Hence, a blockchain belongs to both everybody and to nobody.
What is a hash?
All digital media like documents, movies, and music are just string of binary digits: 0’s and 1’s. A so-called “one-way hash function” is a mathematical function that takes any digital media as input and runs an algorithm on it to produce a fixed-length unique digital value known as a “hash” or “digital signature”, which is much smaller than the original input media. Each time the same digital media is put though the hash function, the same hash value is produced.
Assuming that the media is a plain-text document, if even a single character is changed in this document, the resulting hash would be completely different from the hash of the original document. Therefore, if any modification is attempted on the original media, this change could easily be spotted by comparing the original hash with the hash of the modified media.
The mathematics behind the hash function ensures that there is no way to derive the original digital media from its hash, thus, making the hash function “one-way”. This ensure that the hash value can be used to ensure data authenticity and integrity. If the original data could be derived from the hash value, an attacker could easily modify the media so that it will produce the same hash value, making the whole hashing process irrelevant.
How are hashes used in a blockchain?
In addition to transactions, each block also contains the hash of its preceding block. This means that the digital signature of the current block is based on the digital signature of the previous block, which is also based on the preceding block signature, and so on. Blocks are thus linked together through their hashes, forming a list.
This means that if an attacker change anything back in a past transaction, he/she will not only break the digital signature of the block holding this transaction, but all signatures of the later blocks, thus, creating a completely new version of the blockchain. At this point, the attacker has a completely different version of the blockchain that does no correspond to others’ blockchain, making it easy to detect and reject any attempt made by the attacker to add a new block. Therefore, if only a small amendment is made to any piece of data stored in the blockchain, the chain of digital signatures will be broken resulting in a completely different blockchain that will be rejected by other users.
To preserve data integrity inside a blockchain, “Merkle Trees” are binary trees used inside blocks to hash transactions. For instance, we assume that a block holds 4 transactions (Tx0, Tx1, Tx2 and Tx3). Every transaction is first passed though the hash function to produce 4 different hashes. Pairs of hashes are then combined and passed though the hash function again, creating hashes of hashes. In our example, hashes of Tx0 and Tx1 are combined and hashed again together, and so are hashes of Tx2 and Tx3. This process is repeated until there is one single hash remaining called the “root hash”, forming a complete binary tree.
Merkle trees allow detection of any changes in the transactions of a block. By re-computing the root hash and comparing it with the original once, we can easily spot any altered transactions as the root hash will be completely different. Additionally, Merkle trees also improve storage efficiency. When transactions are buried under enough blocks, they no longer need to be stored within the block thanks to the root hash.
How to add data into a blockchain?
In a blockchain, everybody can insert new blocks. But how can we concurrently handle all those block insertions? For a user to be allowed to insert a new block, he/she must first solve a problem and show its proof to other users. In bitcoin protocol, this process is known as “mining”. The first computer to find the solution wins and can add a new block containing the user’s transactions.
The given problem is computationally hard to solve and requires heavy use of CPU and electricity consumption to find (mine) the answer. This challenge-based process can thus take a lot of time depending on the available hardware resources. Once the solution is found by a computer, this solution, aka. “proof-of-work”, is broadcast to all other computers. Each computer verifies that the given solution does indeed solve the problem before adding the new block to their local copy of the blockchain. When enough computers have accepted this proof-of-work, the new block is added permanently into the blockchain.
This process of mining, based on the Hashcash algorithm, requires computers (aka. miners) to solve a problem with a known partial input derived from the latest state of the blockchain (e.g. input must start with “101001”) to create a specific hash target (e.g. hash must start with “0000”). The partial input is actually the new block itself, whereas the proof-of-work, called “nonce”, is a value that is added at the end of the new block so that the block hash will start with zeroes. As the hash function is one way, computers must try many inputs to create the right hash target. The first computer to answer the problem wins and other computers starts working on a new problem derived from the latest added block.
Does blockchain really removes the need for trust?
Technically, we can assume that blockchain removes the needs for trust and replaces it with a cryptographic proof. This process allows any two willing parties to directly transact with each others without the need for a third party.
However, blockchain does not completely removes the notion of trust. It is true indeed that we no longer need to trust a third party, but the notion of trust did not totally disappear, as it shifted from the third party to the blockchain system itself. In fact, we do now need to trust the blockchain system or, at the very least, to trust the developers who implemented the system, thus, reintroducing a new third party (here the software company). Therefore, for the blockchain to be used at its full potential it is important for the system to be not only secure, but public, open-source and globally distributed around the world, so that the system will not not belong to anyone. As there is no longer any central point, it becomes even more difficult for attackers to corrupt or take down the system.
Does a blockchain really tells the truth?
Blockchain technology can help in many ways to track and ensure that a specific piece of information is reliable. As explained previously, a blockchain uses cryptography to ensure that recorded data cannot be counterfeited or changed by anyone. Furthermore, thanks to the process of hashing, the security of past transactions increases over time as previous blocks are buried deeper into the chain. But how reliable is really an information stored in a blockchain? And what if the initial data is wrong in the first place?
It becomes therefore essential to use other means to make sure that the information stored inside a blockchain is actually true. Other cryptographic techniques such as authentication, digital signatures and smart certificates should be used in symbiosis with blockchain, so that faking information becomes almost impossible. Nevertheless, this does not yet completely remove the need for traditional on-side evidence and verification, since the digitalization of our society is still an ongoing process and that many people still depends on more conventional approaches.
From protecting our identities, to managing a world that is increasingly dependent on digital media and technologies, the possibilities for blockchain applications seem endless. But whether it lives up to its promise remains to be seen. Yet one thing is certain, the blockchain technology is now available and it is now up to our society to decide whether to use these innovations, as we are moving everyday towards a more and more digitalized world.