What are the big data storage tools?

Last Update Time: 2021-04-30 10:49:44

Abstract: There are a lot of big data storage products on the market.

There are a large number of big data storage products on the market. Which products are the best? Obviously, there is no simple answer. Choosing a big data storage tool involves many changing factors, including the existing environment, current storage platform, data growth expectations, file size and type, database and application program combination, etc.

Although this article is not a complete list at all, it still lists several top big data storage tools worth considering.


Major competitors in the field of big data storage


Hitachi provides several big data storage products. Big data analysis tool, Hitachi Super Scale-Out Platform (HSP), HSP technical architecture and Hitachi Video Management Platform (VMP) developed in cooperation with Pentaho Software. The latter example is specifically aimed at big video, a growing subset of big data, for video surveillance and other video-intensive storage applications.


Similarly, DataDirect Networks (DDN) has a number of solutions for big data storage.

For example, its high-performance SFA7700X file storage can be automatically tiered to the WOS object storage and archiving system, supporting rapid collection, simultaneous analysis, and cost-effective retention of big data.

Michael King, DDN ’s senior director of marketing strategy and operations, said: “The Scripps Research Institute uses this product for Cryo-EM, collecting more than 30 terabytes of data a week to find HIV , Ebola, Zika and the treatment of major neurological diseases. In the past, it took at least a year to view the protein structure and the antibodies produced. Cyro-EM completed the discovery process in a few weeks. "

Spectra BlackPearl

Spectra Logic's BlackPearl Deep Storage Gateway provides an object storage interface for SAS-based disks, SMR slowdown disks, or tapes. All these technologies can be placed behind BlackPearl in the storage environment.

Kaminario K2

Kamiario provides another big data storage platform. Although it does not provide classic big data devices, its all-flash array is finding a place in many big data applications.

Kaminario ’s chief technology officer Shachar Fienblit said: “As developers incorporate real-time analytics into their applications, storage infrastructure strategies must be able to manage big data analytics workloads as well as traditional transaction processing workloads The Kaminario K2 all-flash array was developed to support this dynamic workload environment. "


Caringo was established in 2005 to explore the value of data and solve the problems of protecting, managing, organizing and searching data on a large scale. With the flagship product Swarm, users can achieve long-term storage, delivery, and analysis without migrating data to different solutions, thereby reducing total cost of ownership. It has been used by more than 400 organizations around the world, such as the US Department of Defense, the Brazilian Federal Court System, Austin City, Telefonica, BT, Ask.com and Johns Hopkins University.

Tony Barbagallo, Caringo ’s vice president of products, said: “To simplify data acquisition and feed to Swarm, we have FileFly (for Windows file servers and NetApp servers) and SwarmNFS (providing a fully functional NFSv4 infrastructure) . "


The Infogix enterprise data analysis platform is based on five core functions: data quality, transaction monitoring, balancing and coordination, identity matching, behavior analysis, and predictive models. These features are said to help companies improve operational efficiency, generate new revenue, ensure compliance, and gain a competitive advantage. The platform can detect real-time data errors in real time and automatically perform comprehensive analysis to optimize the performance of big data projects.

Avere Hybrid Cloud

Avere provides another big data storage solution. Its Avere hybrid cloud is deployed in various use cases in a hybrid cloud infrastructure. Physical FXT clusters are used for NAS optimization for this use case, making full use of the all-flash high-performance layer in front of existing disk-based NAS systems. FXT clusters use caches to automatically speed up active data, use clusters to expand performance (add more processors and memory) and capacity (add more SSDs), and hide the latency of core storage that is sometimes deployed on the WAN . Users find it a good way to accelerate rendering, genome analysis, financial simulation, software tools, and binary code libraries.

Under the use case of private object-oriented file storage, users want to migrate from NAS to private object storage. They often like the efficiency, simplicity and flexibility of private objects, but they don't like their performance or object-based API interface. In this use case, the FXT cluster improves the performance of private object storage in the same way that NAS optimizes this use case.

Jeff Tabor, senior director of product management and marketing at Avere Systems, said: "In addition, the FXT cluster provides a familiar NAS protocol that can be converted into an object API on the storage side, so that users do not have to rewrite applications or You can use object storage by changing the data access method. "

Finally, the use case for cloud storage networks is similar to the use case for private object-oriented file storage. An added benefit is that companies can start building fewer data centers and move data to the cloud. Latency is one of the challenges that this use case has to overcome, which is exactly what the physical FXT cluster has to solve. During access, data is cached locally on the FXT cluster, so all subsequent access has the advantage of low latency. FXT clusters may have a total cache capacity of up to 480TB, so large amounts of data can be stored locally to avoid cloud delays.


Big data is usually stored on the local disk, which means that in order to achieve efficiency and scalability when the scale of the big data cluster continues to expand, it is necessary to maintain the logical relationship between computing and storage. A question then arises: How to separate the disk from the server and continue to provide the same logical relationship between the processor / memory combination and the drive? How to achieve efficiency in terms of cost, scale, and manageability of shared storage pools while still providing localized benefits? It is said that DriveScale can do this by using Hadoop data storage.

However, storage professionals who want to install and manage resources for big data applications are mainly constrained by the Hadoop architecture, which is itself optimized for local drives on the server. As the amount of data continues to increase, the only way is to buy more and more servers, not only to meet computing needs, but also to provide greater storage capacity. DriveScale allows users to purchase storage capacity independently of computing capacity, so that capacity is just right at every level.

SK Vinod, DriveScale's vice president of product management, said: "There is no reason to introduce the advantages of the proprietary scale-up infrastructure environment that everyone is accustomed to obtain in the data center into a commercial scale-out environment. We provide IT administrators with the flexibility to build and operate A tool for big data infrastructure. In this infrastructure environment, the server and disk subsystem can be decomposed and reorganized in real time as needed. A single drive is allocated to the server from a shared pool composed of JBOD connected disks, thus eliminating disproportionate costs. "


Hedvig distributed storage platform provides a unified solution that allows you to customize and combine low-cost commercial hardware and high-performance storage to support any application, hypervisor, container or cloud. It is said that it can provide storage for any calculation of any size for data blocks, files and object storage, has programmability, and supports any operating system, hypervisor or container. In addition, hybrid multi-site replication uses a unique disaster recovery strategy to protect each application and provide high availability through storage clusters across multiple data centers or clouds. Finally, advanced data services allow users to customize storage with a series of enterprise services that can be selected by volume.

Avinash Lakshman, CEO and founder of Hedvig, said: "For Hadoop, if you want some functions to be handled by HDFS and other functions to be handled by the storage platform, this is very important.  "


The Nimble storage prediction flash platform is said to significantly improve the performance of analytical applications and big data workloads. It does this by combining flash performance and predictive analysis to prevent the data speed barriers caused by IT complexity.


If you want to know more, our website has product specifications for data storage tools, you can go to ALLICDATA ELECTRONICS LIMITED to get more information