Motivation

In the realm of Exascale I/O and data centric computing, data generated by I/O intensive applications and massive data generators may need to be not just stored, but re-organised, transformed, reduced, queried and visualised. Studying complex associations between data and predictive analysis based on these associations will need to be performed. This analysis generates scientific insights that feed back into running simulations.

Hence there is a need for a system that is: a. Capable of storing and retrieving data with very high throughput and low latency at scales that are currently not possible with existing I/O solutions;  b. Capable of running complex data processing and analytics tasks in parallel with simulation to drastically reduce the time to solution, avoiding extreme data movements in the I/O stack between compute and stored data;  c. Capable of providing direct access to vast external data sets

The SAGE project addresses all these issues.