Data has
been the most fundamental ingredient of effective Business Intelligence which helps
management with insights useful in making informed decisions and helpful in
deciding the future of any organization. Let's Understand understand the difference between two often confused terms-
Data Analysis and Big data in detail and why Big Data Analytics is different
from Normal Data Analytics.
Data Analysis and Big data are not the same
Consider a
book shop with sales and purchase data for an entire year containing figures
for the number of customers, sales of books of each genre/author in each month,
the amount of purchases made by each customer etc. This data can be used to
derive business intelligence for the book shop to answer the following
questions:
·
Books
of which genre are sold more in which season?
·
What
is the average purchase capacity of the customers?
·
Which
author was in demand this year?
·
Which
month sees the largest sale of books? And so on…
Finding
answers to these questions is Data
Analysis and the associated methodology is Data Analytics. It is organizing data and deriving business intelligence from it.
Suppose
that bookstore now opens an online store and promotes its products on various
social media networks, and accepts payments through various mobile payment platforms. Now he can track not only what customers bought, but also what else they looked at; how they navigated through the site; how much they were influenced by promotions, reviews, and page layouts. He can even develop algorithms to predict what books individual customers would like to read next—algorithms that performed better every time the customer responded to or ignored a recommendation.
The
presence of so many channels will generate so much more data of so many types
such as transaction details, preferences, tweets, uploaded images, comments,
emails, page views, and recommendations apart from the usual sales and purchase
data which the book store used to generate earlier. Now, this huge amount of
data from a number of sources has storage and analysis requirements of an
altogether different nature. For large organizations, this data may go into
zillions of bytes.
It is estimated that Walmart collects more than 2.5 petabytes of data every hour from its customer transactions. A petabyte is one quadrillion bytes or the equivalent of about 20 million filing cabinets’ worth of text. Source: Harvard Business Review
Now, this kind of Data is called Big Data.
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. Source: Wikipedia
How Big Data is different
Business executives are generally confused about Data Analytics and Big Data. Yes, they are related but there are four key differences.
Volume
With 2 billion PCs and 6 billion cellphones in world, every human being on earth is a data generator now. as of 2012, it is estimated that 2.5 Quintillion bytes of data is generated in the world each day and that number is doubling every 40 months or so. More data cross the internet every second than were stored in the entire internet just 20 years ago.
Variety
Big data comes in the form of messages, updates, images and videos posted to social networks; readings from various sensors; GPS signals from cell phones, Healthcare data, and more. Many of the most important sources among these are relatively new. The huge amounts of information from social networks, for example, are only as old as the networks themselves (Facebook was launched in 2004, Twitter in 2006). The same holds for smartphones and other mobile devices that now provide enormous streams of data tied to people, activities, and locations.
Velocity
There are areas where the speed of data creation is equally important as the volume. In the marketing world, real-time data will enable an organization to be more agile and take effective action. Continuing our bookstore example, the moment a customer posts a comment about a particular book being overpriced in this store, the management can take corrective action and can save its reputation and loyal customers.
Veracity
Since the data is not generated by individuals and at their convenience, there is uncertainty attached to this data. According to an estimate, poor data quality costs the US economy around $3.1 trillion every year.
Due to the above differences, there are different technology and platforms (Hadoop, etc) for Big Data Analytics that bring out business intelligence and help management take informed decisions.
Source: Wikipedia, HBR.org, IBM