Indirect Data Migration in Large-Scale Heterogeneous Storage Systems

Document Type

Grant Proposal

Date Accepted

Summer 2013

Project Description/Abstract

IBM estimates that 2.5 quintillion bytes are being created every day and that 90% of the data in the world today has been created in the last two years alone. This big data comes from search engines logs, social media sites, digital media just to name a few. A large share of this data is not being kept on personal computers but in large-scale storage systems such as data centres. A large-scale storage system can consist of several hundreds to thousands of storage devices. The performance of such systems depends critically on having an assignment of data to storage devices that balances the load across all storage devices, this assignment of data to storage devices is called a data layout. Unfortunately, the optimal data layout changes frequently due to device failures/additions and thus data need to be moved across storage devices accordingly. It is critical to migrate data to their target locations as quickly as possible to obtain the best performance since the storage system will perform sub-optimally during the data migration.

In this work, a randomized algorithm for the data migration problem was designed. This algorithm takes into account, the heterogeneous capabilities of storage devices (often overlooked in currently deployed solutions) and an indirect plan. Furthermore, the algorithm has near optimal provable performance guarantees and can be implemented with acceptable overhead.

This document is currently not available here.