A New Era Begins: GlassHouse Joins the e& Enterprise Family

BIG DATA

Big Data is used to refer to massive, rapidly expanding and varied unstructured sets of digitized data. These massive data sets, which are difficult to maintain using traditional databases, are collected from multiple sources using a variety of methods.

Users leave behind digital traces from their online activities such as shopping and social media shares. These traces provide meaningful information to technologies of this new era, but this is just the tip of the iceberg. Big data can include digitized documents, photographs, videos, audio files, tweets and other social networking posts, e-mails, text messages, phone records, search engine queries, RFID tag and barcode scans and even financial transaction records. With advances in technology the numbers and types of devices that produce data have been proliferating as well. Besides home computers and retailers’ point-of-sale systems, we have Internet-connected smartphones, WiFi-enabled scales that tweet our weight, fitness sensors that track and sometimes share health related data, cameras that can automatically post photos and videos online and global positioning satellite (GPS) devices that can pinpoint our location on the globe, to name a few. Big data technology is essentially the analysis of all these digital traces collected from various devices and sources so that they can be used for intended purposes.

A Short History of Big Data

Available data has continuously increased throughout human history from earliest primal writings to the latest data centers. As data continuously accumulated to large amounts, complex data storage systems became necessary. Although big data has existed for a long time, for most people it has been a confusing subject. The biggest problem is to process big data. Humanity has developed certain methods to process data in accordance with their needs

In ancient Mesopotamia crops and herds information were recorded on clay tablets.

The earliest records of using data to track and control businesses date back to 7000 years ago when accounting was introduced in Mesopotamia in order to record the growth of crops and herds. Since then, methods of processing data has been developed for various reasons.

Natural and Political Observations Made upon the Bills of Mortality

In 1663, John Graunt recorded and examined all information about mortality roles in London. He wanted to gain an understanding and build a warning system for the ongoing bubonic plague. In the first recorded document of statistical data analysis, he gathered his findings in the book Natural and Political Observations Made upon the Bills of Mortality which provides great insights into the causes of death in the seventeenth century.

Herman Hollerith and The Tabulating Machine

American statistician invented a computing machine that could read holes punched into paper cards in order to organize census data in the 1890s. This machine allowed the census of United States of America to be completed in only one year instead of eight, and it spread across the world starting the modern data processing age.

The first major data project of the 20th century was created in 1937 and was ordered by the Franklin D. Roosevelt administration in the USA. After the Social Security Act became law in 1937, the government had to keep track of contributions from 26 million Americans and more than 3 million employers. IBM got the contract to develop a punch card-reading machine for this massive bookkeeping project.

The first data-processing machine appeared in 1943 and was developed by the British to decipher Nazi codes during World War II. This device, named Colossus, searched for patterns in intercepted messages at a rate of 5000 characters per second.

Colossus Computer

In 1965 the United Stated Government decided to build the first data center to store over 742 million tax returns and 175 million sets of fingerprints by transferring all those records onto magnetic computer tape that had to be stored in a single location. The project was never finished, but it is generally accepted that it was the beginning of the electronic data storage era.

Tim Berners, inventor of World Wide Web

In 1989, British computer scientist Tim Berners-Lee invented what is today known as the World Wide Web. He wanted to facilitate the sharing of information via a ‘hypertext’ system. Little did he know what the impact of his invention would be. As more and more devices were connected to the internet, big data sets began to form by 1990.

In 2005 Roger Mougalas from O’Reilly Media, a learning company established by Tim O’Reilly, coined the term big data for the first time, only a year after the company created the term Web 2.0. This was also the year that Yahoo! created Hadoop. It is an open-source software utilities platform used for data storage and processing data over a network of many computers. Nowadays Hadoop is used by many organizations to crunch through huge amounts of data.

In the past few years, there has been a massive increase in big data startups and more and more companies are slowly adopting big data. The increase in all things connected, led to a massive amount of data and the need for data scientists increased.

Advantages and Areas of Application of Big Data

The purpose of big data analytics is to analyze large data sets to help organizations make more informed decisions. These data sets might include web browser logs, clickstream data, social media content and network activity reports, text analytics of inbound customer e-mails, mobile phone call detail records and machine data captured by multi-sensors.

Organizations from different backgrounds invest in big data analytics to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Big data analytics are used across countless industries from the healthcare sector to education, media to the automotive industry, and even by Government.

Thanks to the benefits it provides big data has become a key technology of today:

  • Identifying the root causes of failures and issues in real time
  • Fully understanding the potential of data-driven marketing
  • Generating customer offers based on their buying habits
  • Improving customer engagement and increasing customer loyalty
  • Reevaluating risk portfolios quickly
  • Personalizing the customer experience
  • Adding value to online and offline customer interactions
  • Improving decision making processes
  • Developing the education sector
  • Optimizing product prices
  • Improving recommendation engines
  • Developing the healthcare sector
  • Developing the agriculture industry

In the past people utilized cumbersome systems to extract, transform and load data into giant data warehouses. Periodically all the systems would backup and combine the data into a database where reports could be run. The problem was that the database technology simply could not handle multiple, continuous streams of data. The huge flow of data led to many problems such as not being able to handle the volume of data and modify incoming data in real time.

Big data solutions offer cloud hosting, highly indexed and optimized data structures, automatic archival and extraction capabilities, and reporting interfaces have been designed to provide more accurate analyses that enable businesses to make better decisions. Businesses started using big data technology that reduce costs through efficient sales and marketing, and better decision making intensively.

Dictionary Home Page