Where Business Analytics, Data Warehousing, and Big Data Intersect.
For the past 30 years, with the advent of the relational DB as the de facto Enterprise IT standard for Corporate America, organizations have attempted to gain further business insight by storing and messaging their data out of a central repository (also known as a data warehouse). As the technology matured during the 1990’s and 2000’s the field of business intelligence and data warehousing took off with Gartner estimating the total 2016 spend at $16.9B.
Until very recently, most of that spend was on the traditional IT Centric Service Model, where an operational report or analytics need would turn into a traditional data warehouse project, front-ended with a traditional business intelligence tool. In recent years, the monolithic data warehouse went out of favor and in its place was a more iterative and agile approach, but with a similar design topography and data architecture including a star schema with an ETL layer for data transformation and a BI platform for reporting and analytics. More often than not, these systems were expensive to build and did not live up to their lofty goals. The care and feeding of these systems require 24/7 support especially within the ETL component, with loads that update the warehouse incrementally or overnight. As a result, it is not unusual for an ETL process in a mature warehouse to run 6-10 hours, and in many cases much longer. By today’s standards, not exactly technological light speed and the users deserve better.
As the BI Ecosystem has matured, Big Data and the promise of Hadoop has been a hot topic in the technology space but so far has been more hype than reality. Analysts have estimated the Big Data market (includes cloud, hardware, software and services) to grow to over $50B by the early 2020’s from the $1B today. Although it is very true that some of the most successful organizations have built their success on the back of Big Data (including Facebook, Amazon, Netflix and Google) for much of Corporate America, it has been a slow crawl to Big Data. Considering how quickly newer technologies have been embraced in the consumer space (think DraftKings, SnapChat, Instagram) why has Big Data been slow to be embraced? Some would say, without a specific use case that screams for the need of a Big Data application, the initiative will fail. Others would say, the technology is still not mature and unproven although organizations like Cloudera, which provides the wrapper and admin consoles for Hadoop, have legitimized the technology for commercial scale use.
That begs the question if we are taking the right approach with Big Data? Use cases and vision are important but also very limiting in scale and scope. Great, we have successfully proven out that Big Data will allow an insurance company to truly understand customer behavior, but what’s next, where is the payback or ROI? Fantastic, sensor data is allowing companies to be better equipped to handle parts failure in a product, but what about the rest of the enterprise? This is where business intelligence and traditional data warehousing intersect and should have been linked together from the outset of the Big Data hype cycle. Business Analytics and Big Data are striving for the same objective, to maximize the usage and analysis of all of an enterprise’s data assets. However, the technology foundation and structures are very different with the Java based Open Source Hadoop versus the conventional relational models of traditional Data warehousing. Organizations and people tend to go with the path of least resistance and what they know, and the Hadoop skill-set market is still limited but growing.
For more validation, the father of modern data warehousing Ralph Kimball has stated there is a better way than the design and data architectures he created. He believes that the future is Hadoop, the foundation of Big Data. Ultimately, what is most important is that any organization that has a successful business intelligence program, or is just starting out, should consider Hadoop as a core component of its modern data architecture and analytics program. The HDFS file system can play along and in some cases replace a traditional environment including the ETL Component. At the very least, Hadoop can be tightly integrated with the conventional ecosystem and will be sure to lower operating costs and improve performance. The relational DB is not going away and will be core to the modern data architecture, but Big Data is more than hype or promise, and should be adopted as a key piece of the overall information architecture.