Scientific Mission: Most of the world's information is unstructured: text, images and sensory data. Accessing it in a meaningful way and extracting relevant content is inherently difficult - relevant concepts may be expressed in a wide variety of ways. MIAS mission is to develop the theories, algorithms, and tools for analysts & researchers to access a variety of data formats, integrate unstructured data with existing resources and transform raw data into useful & understandable information.

Educational Mission: Develop diverse human resources to enhance scientific research in MIAS areas. Done via the Data Science Summer Institute, an intense 8-week summer school at the University of Illinois at Urbana-Champaign, which covers foundations, advanced topics tutorials and hand-on research projects.

Research Themes

Focused data retrieval and integration

  • Source Modeling: How to build a "map" of the online information space.
  • Data Crawling: How to gather data from multiple relevant sources in different formats.
  • Multimodal Data Retrieval: How to interactively retrieve data from the repository in a highly focused and integrated fashion.
  • Analyst-Centric Retrieval: Develop learning techniques to capture and track the information access context of an analyst and provide personalized support for information access and analysis unique to the needs of the user.


Semantic data enrichment

Handling the overwhelming array of different data formats, understanding data layout, inferring metadata for a variety of text sources and images, inferring semantic markup and construct and augment knowledge bases. Specific topics include: adaptation of entity extraction; text and images; building topic structure based on images and their captions, linking stories that use similar images, and building visual knowledge


Entity identification and relationship discovery

Entity Uncertainty and Semantic Integration: Developing a robust and efficient semantic integration solution to the problem of entity uncertainty (mention matching). Entity and Relationship Generation: Group matched mentions from entities; develop methods to extract relevant entity attributes. Temporal Aspects and Entity and Relationship Maintenance: Maintain the extracted entity-relationship graph over time, as the underlying raw data changes (e.g., part of the data is deleted, modified, or new data is added).


Knowledge discovery and hypotheses generation and verification

Exploiting rich semantic structure generated by identifying entities and relationships among them to promote knowledge discovery and to generate hypotheses that emerge from "surprising" correlations or structural events. Mining hidden links in massive information networks: Mining multiple, heterogeneous semantic links to discover crucial hidden relationships that may have strong impact on strategically important tasks. Streaming Data: Developing effective methods for clustering and classifying high-dimensional and evolving data streams.


Mathematical and Computational Foundations

Improving fundamental understanding in Machine Learning, Natural Language Processing, Database Theory, Combinatorial Optimization, Algorithmic Data Mining, and Probabilistic Modeling