2007 Data Sciences Summer Institute

The Center for Multimodal Information Access and Synthesis Data Sciences Summer Institute (DSSI) offered 8 weeks of intensive study to more than 20 students of diverse interests and backgrounds in early summer, 2007. The aim of the project was to give students a taste of the exhilaration that arises from academic discovery by first assuring that they had an abundance of background knowledge, and then by providing them the opportunity to apply their knowledge in the context of a focused research objective. To this end, the Institute had four primary components:

A Foundations of Data Sciences course that consisted of approximately 40 contact hours and that included themes of probabilistic and statistical modeling, data reduction, algorithms, and optimization. The course was offered during the first 4 weeks of the institute.

A set of week long (10-15 hour) mini-courses or tutorials whose purpose was to give students a more applied view of the discipline. Students selected 3 tutorials from among the following: Databases and Information Integration, Information Extraction, Machine Learning, Computer Vision, and Data Mining. The tutorials were taught by notable UIUC faculty and staff.

A Lecture Series of national experts selected so as to give students a glimpse of academic and corporate research in a variety of sub-fields, and also, to provide our academic community the opportunity to reinforce collaborative relationships. By way of example, our honored guests included Dr. Ed Hovy of ISI, and Dr. Chris Olston of Yahoo! Research. Guests typically provided three hours of lecture and a day or two of student and faculty consultations.

A group of team research projects planned and administered by UIUC faculty and senior graduate students, and in collaboration with faculty and students from the University of Texas, San Antonio. The three research projects varied dramatically in flavor, and included: software tool development for Named Entity Recognition, infrastructure creation in support of innovation in Image Annotation, and fundamental research for a Virtual Web. The research projects were the focus of the second 4 weeks of the institute. Each research team consisted of approximately 7 DSSI students, 2 graduate student leaders, and one faculty advisor.

Students and faculty involved in the project were in-residence at UIUC for the duration of the DSSI. Student participant demographics were: 55% undergrad, 45% early graduate; 32% UIUC, 68% non-UIUC; and 32% under-represented, 68% majority.

The DHS supports non-UIUC student participation and all staffing for the DSSI in summers 2008 and 2009. Ongoing financial challenges for the Institute include funding for UIUC student participation, and the continuation of the DSSI beyond 2009.

Non-financial challenges for the DSSI include continued content development and refinement, annual research project selection, diverse student recruiting, and the development of a model for program growth.