Explainable AI assisted Scientific Data Analysis Workflow

Besides accurately predicting the outcomes of high-dimensional non-linear functions, many state-of-the-art deep learning models can also be utilized to interpret and analyze the underlying physical phenomena in greater details. By opening the so called, black-box of these powerful machine learning models, we can extract interesting insights about the application domain which the models learned during the training process. We incorporate some of the recent advances from the field of uncertainty quantification, interpretability, and explainability of deep learning models to design interactive visual analysis frameworks to facilitate an exploratory analysis workflow for the domain experts.

Multivariate Distribution Modeling for Visualizing and Analyzing Large-Scale Simulation Data Using Copula Functions

A popular and effective strategy for analyzing and visualizing large-scale scientific datasets in a scalable manner has been to use statistical probability distributions as intermediate data representation. In this project, we propose a flexible distribution-driven analysis framework for modeling multivariate distributions in an efficient manner using copula functions. Copula functions offer a statistically robust mechanism to model the dependency structures of variables irrespective of the type of univariate marginal distributions used to model the individual variables. Using this copula-based framework, we address two major analysis and visualization challenges for large-scale simulations. First, for multivariate simulations, where multiple physical variables are computed in each simulation execution, we create in-situ multivariate data summaries, which are subsequently used for scalable post-hoc analysis and visualization tasks. Second, for ensemble simulations, where the same simulation gets executed multiple times with different input parameter settings and/or initial conditions, we create multivariate distribution-based uncertainty models to facilitate analysis and visualization of uncertain features like isosurfaces and vortices.

Information-theoretic Framework for Visualizing Ensemble Simulations

Ensemble datasets are one of the primary sources of uncertain datasets in scientific studies. While modeling and measuring a real world phenomenon via simulations, the lack of knowledge regarding the ground truth compels scientists to use multiple initial conditions and/or different input parameters to get an estimate of the possible outcomes. The resulting ensemble datasets are used for decision making in real world and thus, are of prime importance to the scientists. Using information-theory measures like mutual-information, specific-information and conditional entropy, we proposed novel analysis techniques to understand ensemble of isocontours in large-scale data generated from scientific simulations. We proposed strategies to visualize the spatial variations of isosurfaces with respect to statistically significant isosurfaces within the ensemble. This helps in analyzing the influence of different ensemble runs over the spatial domain.