UNM CS Professor Receives Prestigious NSF CAREER Award
CAREER: Enabling Distributed and In-Situ Analysis for Multidimensional Structured Data
Advances in modern science have led to explosions of data across all science, technology, engineering, and mathematics (STEM) disciplines. Extracting meaningful knowledge from this large pool of information has become both complicated and costly. In fields like genomics and astronomy, where very large volumes of data are produced daily, it is necessary to store repositories throughout multiple, geographically distinct locations. This type of data allocation results in expensive computations and incomplete analyses. For health informatics and finances, data is typically isolated between research centers due to privacy, security, or cost issues. Again, the inability to have a global view of the data yields inaccurate outcomes at computation time. The classic centralized approach to analyzing data no longer produces optimal results; it has become a major bottleneck, hindering the advantages Big Data has to offer. Current solutions for distributed analysis still lack generality, scalability, or accuracy.
This project aims to ameliorate problems in the management of distributed data while enabling scalable and accurate analyses. The project provides a comprehensive approach to handle data-to-knowledge extraction, representation, and learning at scale. Products of this research include: (1) an algorithmic suite of semantic projections and scalable learning methods for efficient data dimensionality reduction, pattern recognition, anomaly detection, and clustering, and (2) an open source middleware for coupling distributed data acquisition processes with in-situ analytics and crowd sourcing. These products will be made available through a GitHub repository at https://github.com/distributedreasoningatunm. Moreover, the crowd sourcing extension doubles as an educational platform, which aims to attract interest in the STEM fields.
PI: Trilce Estrada-Piedra