The Big Data Storage Conundrum

Saurabh Saha

Webster defines data as ‘factual information used as a basis for reasoning, discussion and calculation’.

Data since human evolution has been growing exponentially

As Eric Schmidt chairman, Google states “From the dawn of civilization until 2003, humankind generated five exabytes (1 exabytes = 1 billion gigabytes) of data. Now we produce five exabytes every two days and the pace is accelerating”.

Now the question is how exactly is data increasing at such a dramatic pace? To answer this question we’d have to rewind to couple of years back when Web 2.0  was shaping up. Before that Web 1.0  was merely a tool for computer systems to connect over a network. The mundane activities could be achieved over Web 1.0  but when it came to democratization of the web, Web 1.0  had its own limitations.  It was then that the idea for having Web 2.0  was realized.

Web 2.0 gave users the liberty to create user accounts or profiles on a site and comment, tag, socially bookmark etc. It provided all the necessary tools for manifesting collective intelligence. It was at the onset of this transition when social networks appeared on the scene and revolutionized the entire Web 2.0 experience. Social networking tools which started with MySpace and Orkut evolved into hybrid tools like Facebook, Twitter, Instagram and Tumblr. Out of which Facebook could be credited to be the most powerful social networking tool which spread like wildfire globally with a current user base of over a billion people worldwide.

Too much Facebook

With the advent of tools like Facebook and Twitter the web was overwhelmingly democratized. As Thomas Friedman described in his famous book ‘The World is Flat’.Technological innovations like the aforementioned cut down geographical barriers and it became easier for denizens of the modern world to communicate with each other. Social networking since then has had plethora of implementations in the government, social and the commercial sectors.

Businesses identified the social outreach social networking tools like Facebook or Twitter had and started using them to evangelize their products and services. Professional networks like Linkedin emerged to help recruiters and organizations hire the best possible talent available in the market. Governments and academia identified the importance of social networks and started leveraging their use to the realization of their objectives.

It’s been a while now and social networking tools like Facebook and Twitter have become billion dollar businesses catering to the biggest organizations worldwide. In other words they have become an inevitable part of our digital landscape. However with every invention there is a downslide. The downslide that organizations worldwide have started identifying is the accumulation of massively large datasets that have become increasingly difficult to process using conventional data frameworks like relational databases. In other words, until a while back, data from a website or a web application was stored on a relational database management system (RDBMS).

From Collecting Customer Data to understanding Consumer Behavior

Since the volume of the data was not humongous the rate at which data could be fetched was fast which facilitated all necessary transactions. Now with an increasingly complex web architecture and huge datasets conventional database systems have started showing performance issues not because they are inefficient but because they have reached their performance thresholds. To tackle this frameworks like Hadoop have sprung up and storage systems like MongoDB have appeared that have the capacity to store and analyze massive datasets. As a result of which organizations worldwide are investing billions of dollars in establishing ‘Server Farms’ to store these massive datasets for personal use as a large part of their revenue is dependent on how effectively customer data could be scanned to understand consumer behavior.

According to Mark Zuckerberg the billionaire boy soldier/CEO of Facebook  people will share twice as much information as they share this year, and next year, they will be sharing twice as much as they did the year before. Call it Zuckerberg’s law or something else but it surely is going to be a predicament when datasets reach singularity and the current server storage space start falling short from a physical as well as economical point of view. It is then digital revolution would cease to move any further. Besides physical storage is always subject to wear and tear.


Imagine a future where datasets have become so huge that conventional storage fails to do justice and devoid of data analysis the entire human population is deeply embedded in anarchy and chaos under an oppressive and tyrannical authority that has nullified democratization of information or information dissemination like George Orwell’s much talked about dystopian novel Nineteen Eighty Four. The situation could be termed as data explosion and could spell out disaster for the human species.

Luckily for us a team of researchers have found ways of storing data on a DNA molecule called synthetic DNA storage. Nature’s been storing the blueprint of life in the DNA of every cell. Inspired by this the team headed by Dr Manish Kumar Gupta of  DAIICT, Gujarat has been able to create a software called ‘DNA Cloud’ which will encode the data file in any format (.text, .pdf, .png, .mkv, .mp3 etc.) to DNA and also decode it back to retrieve original file.

This approach is revolutionary and could solve the Big Data crisis forever since recently a bioengineer and geneticist at Harvard’s Wyss Institute have been able to store data as huge as 5.5 terabytes in a single gram of DNA. The best part about storing data on DNA is it is indestructible and will sustain data despite of any adverse condition in the surrounding system.  Thus the conundrum that Big Data had created now has a foolproof solution capable of immortalizing massive datasets till the end of time.