Chaos Sumo, a developer of object storage, has developed with it calls the first “intelligent,” smart storage for Amazon Web Services S3. The technology, Chaos Sumo Smart Object Storage service, provides data discovery, management, and analytics as a way to simplify storing of increasing amounts of data using the popular AWS S3 service, said Thomas Hazel, founder and chief technology officer for the Boston-based company.
Chaos Sumo embeds intelligence in S3 via its Data Edge platform, Hazel told CRN. “So many companies are doing data analytics on object storage,” he said. “I wanted to go beyond the noise. Everybody’s dumping things into S3 object storage, but they face several issues.” Chief among those issues is the fact that customers are dealing with so many buckets of storage in S3, Hazel said. “Bucket” is the AWS term for a logical unit of object storage.
“Chaos Sumo funds all those buckets,” he said. “There’s just a click to discover them, which can then be dragged to a virtual bucket without the need to build a new service or ETL [extract, transform, and load] the data to normalize it. Those virtual buckets provide an aggregated correlation between the different physical buckets, which can then be queried with the S3 API.”
Chaos Sumo could have created a new file store to handle those physical buckets, Hazel said. “But instead, we created a simple service to create virtual buckets without the need for coding or scaffolding,” he said, referring to the process of building a new Hadoop cluster.
A typical use is for IoT, where customer might be looking to store data from multiple devices in AWS S3 because it is easy, elastic and cheap, Hazel said. A customer might want to discover what was stored and then refine it to present to apps for analysis before passing to data scientists to organize the information, he said. “In a classic case, they would have to build a Hadoop cluster and hire a RedShift database administrator,” he said. “Now they get self-service. Data scientists can now use Data Edge to log into S3, find the data, group it as needed, and get results.”
Chaos Sumo increases the importance of the data scientist, Hazel said. “We let companies do more with their data scientists,” he said. “It may actually decrease the importance of data engineers or data administrators who might be sitting in the middle to organize the data before sending it to the data scientists, business analysts, and business intelligence people.”
Chaos Sumo has turned out to be a much better way to manage customers’ data lakes in AWS, said Kevin O’Rourke, co-founder and chief technology officer at JetSweep, a Chelmsford, Mass.-based solution provider and partner to both companies. Because of Hadoop and big data, the whole data lake concept has become sort of a “Wild West” as companies increasingly dump data into S3, O’Rourke told CRN.
“We looked at Hadoop distributions and products on top of Hadoop as a way to manage those data lakes,” he said. “But when we saw Chaos Sumo at an AWS meeting, we felt the company really understands this. We caught them early on in their development cycle, and even volunteered to be in their beta cycle.” S3 has several issues, not the least of which is the probability of getting lost with the content in S3, O’Rourke said. “There’s no single centralized management layer, and no single person software layer,” he said. “Chaos Sumo does it all nicely.”
Furthermore, O’Rourke said, a lot of products that integrate with S3, including third-party applications. need to map to specific buckets or folders as a connection. “Chaos Sumo has a concept of the virtual bucket which allows a set of virtual buckets or virtual folders to act as one,” he said.
Also, Chaos Sumo can integrate physical and virtual buckets together as a way to cleanse the data when bringing them together to discover what’s inside and do quality checks, he said. When Chaos Sumo talks about the concept of data lakes, a lot of people might think of a central repository, O’Rourke said. “But it’s really a logical repository,” he said. “There are buckets everywhere. This isn’t a large folder-like repository. Those buckets have different purposes. Some are refined, some are raw, and some are transient.”
Chaos Sumo provides channel partners the opportunity to differentiate themselves with Amazon S3 skills, Hazel said. A free edition of Data Edge is available that lets customers discover what data they have stored in S3, Hazel said. The premium edition, which allows discovery, organization, and query of the data, is available for about $100 per month for 10 GBs or more of data processed during the month, a cost which rises into the thousands of dollars at terabyte capacities. “But it’s dramatically lower than the scaffolding or other services,” he said. “RedShift or building Hadoop clusters can be very expensive.”