HPC-FAIR: A Framework Managing Data and AI Models for Analyzing and Optimizing Scientific Applications

PI: Chunhua Liao, Lawrence Livermore National Laboratory

Co-PIs: Xipeng Shen, North Carolina State University, Murali Emani, Argonne National Laboratory, Tristan Vanderbruggen, Lawrence Livermore National Laboratory

Supercomputers increasingly rely on hierarchical and heterogeneous node architectures for improved performance and efficiency. Artificial intelligence (AI) and machine learning (ML) techniques have been widely studied to address various challenges of productively and efficiently running scientific applications on supercomputers. These approaches are limited, however, because it is extremely difficult to generate, access, and maintain high-quality training data needed to drive ML-based research. The overarching goal of this proposal is to develop a generic High Performance Computing data management framework (named HPC-FAIR) to make both training data and AI models of scientific applications findable, accessible, interoperable, and reusable. In particular, we will focus on the following innovations:

  • Collecting and generating a set of representative training data and AI models
  • Developing program pattern analysis and translation tools to augment, correlate and annotate data from multiple sources
  • Designing an internal representation (called Data IR) for representing both training data and AI models
  • Providing easy access and user interfaces
  • Studying a workflow synthesizer to automatically convert user queries into optimized sequences of data processing function calls

The outcome of the proposed project is anticipated to fundamentally improve the Findability, Accessibility, Interoperability, and Reusability (FAIR) of both training data and AI models, thereby driving innovation in AI and ML. The datasets and AI models from HPC-FAIR will also serve as common baselines to quickly, consistently, and fairly evaluate new AI models for quality, complexity, and overhead. While we pick the example domain of analyzing and optimizing scientific applications, the developed techniques will be directly applicable to manage datasets and AI models for any other scientific domains.