At its core, Blockchain is data storage and retrieval. Typically the data is assumed to be a transaction. Hyperledger has expanded that assumption to include assets, accounts, permissions, etc. But what about images (or other large files)? The prevailing opinion seems to be to use the Blockchain to hold the pointer to the actual location somewhere else. This is fairly standard in general database design.
But should that be the case with Hyperledger? Perhaps the benefits of Blockchain encryption, security, immutability, etc., might outweigh the standard arguments for not storing images in a database. I wanted to explore this idea further, so I first decided to look at ‘can we’, and then look at ‘should we’.
In testing this we see the same impact as using any database to store images. That is, bandwidth is often the most limiting factor, especially once the file size starts to approach typical cell phone picture high-resolution numbers in the several megabyte range. X-rays, CT or MRI scans, and other healthcare image files may be many times that in size. When using small images, < 100k, the time to transmit and update the master ledger or retrieve and render the image was not noticeable from a user perspective. However, as we would suspect, image file sizes of even 4 megabytes showed significant time lag, probably past the point of acceptability over less than gigabyte network bandwidth speeds.
Certain internal or business networks may have the bandwidth, and general consumer and mobile bandwidth speeds may eventually increase to push data fast enough, but there are also the write and read times. A typical Blockchain record may have a few hundred characters. A Base64 image string may be many thousands, if not hundreds of thousands. This will be true no matter the storage mechanism, but we have the consensus and duplication across the peers to add to the equation. The image string will also need to be transmitted to the other peers participating in consensus and written to their copy of the master ledger. This is already the largest time constraint within a Blockchain, and adding significantly to that is unlikely to be advisable.
But are there any arguments in favor of storing images directly in the Blockchain? Immutability is one of the key promises of Blockchain, and storing information outside of the Blockchain seems to increase the risk of unacceptable data change. Of course, the image can be encrypted in the external storage and only decipherable with authorized certificate keys, etc. But that’s a few more hoops to jump through in applications, and mistakes or laziness could potentially allow easier access than if the images were only in the Blockchain itself.
The 1.0 Hyperledger release allows the use of a NoSQL (CouchDB) database, which handles images in documents just fine, and this layer has been abstracted to allow future storage options to be utilized (StorJ or HDFS anyone?). So read and write times may become irrelevant with those options. The time impact for consensus is still a concern, but some private Hyperledger Blockchain for Business networks may only have a few peers and alleviate that to some degree.
So, ‘should we’ use the Blockchain to store images and other large files? I think the general arguments for not using a database to store images are still mostly appropriate, and the processing and storage duplication across all peers is still mostly just wasteful and inefficient. But it is technically possible, the performance impact may be substantially mitigated with the continued improvements in Hyperledger, and there may be some use cases that make it appropriate. I advise to keep an open mind and not automatically carry over assumptions when working in Blockchain or other new technologies. Any given use case may still see enough advantage to justify using a Blockchain to store images.