Types of Big Data
Traditional enterprise data includes customer information from CRM systems, transactional ERP data, web store transactions and general ledger data.
Machine-generated /sensor data includes Call Detail Records (“CDR”), weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust), trading systems data.
Social data includes customer feedback streams, micro-blogging sites like Twitter, social media platforms like Facebook.
Value — The economic value of different data varies significantly. Typically there is good information hidden among a larger body of non-traditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis.
Building a Big Data Platform
As with data warehousing, web stores or any IT platform, an infrastructure for big data has unique requirements. In considering all the components of a big data platform, it is important to remember that the end goal is to easily integrate your big data with your enterprise data to allow you to conduct deep analytics on the combined data set.
Acquire Big Data
The acquisition phase is one of the major changes in infrastructure from the days before big data. Because big data refers to data streams of higher velocity and higher variety, the infrastructure required to support the acquisition of big data must deliver low, predictable latency in both capturing data and in executing short, simple queries; be able to handle very high transaction volumes, often in a distributed environment and support flexible, dynamic data structures. No SQL databases are frequently used to acquire and store big data. They are well suited for dynamic data structures and are highly scalable. The data stored in a No SQL database is typically of a high variety because the systems are intended to simply capture all data without categorizing and parsing the data into a fixed schema.
Organize Big Data
In classical data warehousing terms, organizing data is called data integration. Because there is such a high volume of big data, there is a tendency to organize data at its initial destination location, thus saving both time and money by not moving around large volumes of data. The infrastructure required for organizing big data must be able to process and manipulate data in the original storage location; support very high throughput (often in batch) to deal with large data processing steps and handle a large variety of data formats, from unstructured to structured.
Hadoop is a new technology that allows large data volumes to be organized and processed while keeping the data on the original data storage cluster. Hadoop Distributed File System (HDFS) is the long-term storage system for web logs for example. These web logs are turned into browsing behavior (sessions) by running map reduce programs on the cluster and generating aggregated Oracle.
Analyze Big Data
Since data is not always moved during the organization phase, the analysis may also be done in a distributed environment, where some data will stay where it was originally stored and be transparently accessed from a data warehouse. The infrastructure required for analyzing big data must be able to support deeper analytic such as statistical analysis and data mining, on a wider variety of data types stored in diverse systems; scale to extreme data volumes; deliver faster response times driven by changes in behavior and automate decisions based on analytical models. Most importantly, the infrastructure must be able to integrate analysis on the combination of big data and traditional enterprise data. New insight comes not just from analyzing new data, but from analyzing it within the context of the old to provide new perspectives on old problems.
- Full distribution of Cloud era’s Distribution including Apache Hadoop (CDH4).
- Oracle Big Data Appliance Plug-In for Enterprise Manager.
- Cloud era Manager to administer all aspects of Cloud era CDH.
- Oracle distribution of the statistical package.
- Oracle No SQL Database Community Edition-2.
- And Oracle Enterprise Linux operating system and Oracle Java VM.