CH Old Blog: GPFS breaks file scanning record by 37x

Editor's note: Guest author Richard Freitas is a computer scientist in storage class memory at IBM Research - Almaden.

In our increasingly instrumented, interconnected and intelligent world, individuals and companies are struggling to deal with the explosive growth of data. According to a recent study there will be 1800 exabytes (EB) of digital data in 2011, up from 220 EB in 2007. Such growth places the data user under tremendous pressure to turn data into actionable insights quickly, while straining the world’s IT infrastructure to its limits. This forces the data user to manage the explosive growth of data and storage using tools designed for easier times.

Existing storage management solutions find it difficult to provide timely storage backup, migration to appropriate performance tiers, replication and distribution. In many cases, users go without the kind of daily backup that industry experts would expect of a large data store. As new applications emerge in industries from financial services to healthcare, traditional systems will be unable to process data on this scale, leaving users exposed to critical data loss.

Anticipating these storage challenges decades ago, Almaden researchers created GPFS, a highly scalable, clustered parallel file system. Already deployed in enterprise environments with one billion files to perform essential tasks such as file backup and data archiving, this technology’s unique approach overcomes the key challenge in managing unprecedented large file systems with the combination of multi-system parallelization and fast access to file system metadata stored on a solid-state storage appliance.

GPFS advanced algorithms make possible the full use of all processor cores on all of these machines in all phases of the task (data read, sorting, and rules evaluation). GPFS exploits the excellent random performance and high data transfer rates of the 6.5 TB solid-state metadata storage. The solid-state appliances sustainably perform hundreds of millions of IO operations, while GPFS continuously identifies, selects and sorts the right set of files from the 10 billion-file file system. Performing this selection in 43 minutes was achieved by using GPFS running on a cluster of ten 8-core systems and four Violin Memory solid-state memory arrays. This is 37 times the rate that was achieved in 2007 on a file system containing 1 Billion files.

You can read more about this IBM Research breakthrough here.

CH Old Blog

Friday, July 22, 2011

GPFS breaks file scanning record by 37x

1 comment: