Sage Ecosystem

Extreme Data Management

Exascale and Data Intensive computing will be characterised by the need to manage and move extremely large volumes of data between multiple tiers of storage. SAGE provides such a highly tiered storage platform that can be be exploited by data management tools and utilities ( eg: Hierarchical Storage Management tools & Data Integrity checking tools). Adaptations and evolutions of existing data management mechanisms to manage data very effectively in percipient storage will be researched and developed in SAGE. SAGE will also look into backward capability of the storage system with existing data access interfaces. Integrity checking of extreme amounts of data will also be addressed.

Extreme Data Analytics

Exascale data centric computing has created a need to study advanced data analytics pipelines beyond existing techniques suitable for commercial big data analytics. There is a need to do new types of analytics on both extreme volumes of external data inputs as well as simulation output data. SAGE will study Extreme Data analytics methods beyond traditional MapReduce that includes exploiting deep tiers of Non-Volatile memory available in Percipient Storage.

Programming Methods

SAGE will provide advancements in  programming models for Percipient Storage. Beyond a runtime system providing direct access to the different storage tiers and processing capabilities, the project will explore integration with standard programming models such as message passing, i.e. MPI or PGAS. The offload of computational kernels to the different tiers of the SAGE I/O system will be investigated within MPI.  SAGE furthermore explores PGAS and similar approaches allowing for global memory spaces that integrate the virtual memory on each local compute node with the internal tier of NVRAM.

Optimization tools

A new data centric computing architecture enabled by SAGE is also encumbered by the need to develop more sophisticated tools to analyse and debug the system to further optimize applications. Tools will need to expose the true cost of I/O and storage for developers and users, a focus that is not maintained accurately and flexibly by tools that exist today. Also debugging tools that that support computational offload to storage needs to be researched and developed. SAGE addresses both these issues.