Infosys Prize - Laureates 2019 - Prof. Sunita Sarawagi

Engineering and Computer Science

Sunita Sarawagi

Institute Chair Professor, Computer Science and Engineering, Indian Institute of Technology, Bombay

The Infosys Prize 2019 in Engineering and Computer Science is awarded to Prof. Sunita Sarawagi for her research in databases, data mining, machine learning and natural language processing, and for important applications of these research techniques. The prize recognizes her pioneering work in developing information extraction techniques for unstructured data.

Infographic:

What are the chances?

Scope and Impact of Work

Prof. Sunita Sarawagi’s research is based on the development of fundamental principles and has had profound practical impact. Both these characteristics can be illustrated using just two examples from Prof. Sarawagi’s many papers.

Postal addresses have a structure: country, state, PIN, city, street, and so on. However, postal addresses that appear on the web and in other repositories are continuous text and often have some of these attributes missing. A challenge is to convert such unstructured text into structured information, which is much more efficient for handling queries and other applications.

Previous work on this problem had taken largely ad hoc approaches that were often labor-intensive. Prof. Sarawagi extended the theory of Hidden Markov Models (HMM) to solve this problem automatically. She and her colleagues created a software package, DATAMOLD, which has been used by many companies to improve address structuring in India.

The second example is Sarawagi’s work on extracting numerical information from unstructured text containing numbers on the web. Examples of queries with numerical answers are: “What is Microsoft’s revenue?” and “How many calories in a pizza?” The queries are imprecise: What size pizza and with what toppings? What are the units: calories or kilocalories? Nevertheless, users post many such questions to search engines and expect an answer.

Prof. Sarawagi and her colleagues developed QuTree to deal with units, and a probabilistic model for collective consensus. They implemented a family of algorithms, tested them, and reported the results of their algorithms comparing them with ground truth (such as values reported by the World Bank). These papers exhibit a systematic approach to build foundational models and theories, and then develop software and carry out testing on critical problems.

Bio

Prof. Sunita Sarawagi is Institute Chair Professor in Computer Science and Engineering at IIT-Bombay.

Prof. Sarawagi received her B.Tech. in Computer Science from the Indian Institute of Technology, Kharagpur in May 1991. She received her M.S. and Ph.D. in Computer Science from the University of California at Berkeley where she studied under Michael Stonebraker.

Following her Ph.D. Sarawagi did stints at IBM Almaden Research Center as research scholar, Carnegie Mellon as visiting faculty, and joined IIT-Bombay in 1999.

Between July 2014 and July 2016 Prof. Sarawagi was Visiting Scientist at Google Inc. in Mountain View where she worked on deep learning models for personalizing and diversifying YouTube and Play recommendations, improving a conversation assistance engine, and extracting attributes of classes from the Knowledge Graph.

Among her many awards are Distinguished Alumni Award from IIT Kharagpur (2019), IBM Faculty award (2003 and 2008); Fellow of the Indian National Academy of Engineering (INAE) (2013).

Prof. Sarawagi has several patents to her name. They include a patent for “Database System and Method Employing Data Cube Operator for Group-By Operations” and a patent for “Efficient evaluation of queries with mining predicates”. Her topics of interest span several fields including machine learning, data analytics, databases, and statistics. Her current research interests are sequence models for text and time-series, domain adaptation, effective human intervention in learning, graphical models and structured learning.

Timeline

1991-1996

B.Tech. (Computer Science), IIT-Kharagpur; M.S. (Computer Science), UC Berkeley; Ph.D. (Computer Science), UC Berkeley

1996

Works as IBM Almaden Research Center

1999

Appointed Assistant Professor at IIT-Bombay

2003

Wins IBM Faculty Award

2004

Co-develops the Semi-Markov Conditional Random Field (Semi-CRF) model with William Cohen at Carnegie Mellon University

2013

Fellow of the Indian National Academy of Engineering

2014

Appointed Visiting Scientist at Google Inc.

2017

Appointed Institute Chair Professor, Computer Science at IIT-Bombay

2019

Awarded the Infosys Prize in Computer Science and Engineering

1991-1996

1996

1999

2003

2004

2013

2014

2017

2019

Jury Citation

Prof. Sunita Sarawagi was one of the earliest researchers to develop information extraction techniques that went beyond the world of structured databases to the kind of unstructured data one finds on the World Wide Web. This necessitated the use of novel machine learning techniques for extraction of information from natural language text. For example, Prof. Sarawagi and colleagues showed how one could extract and analyze unstructured numerical data in the web and other sources. She developed the formalism of semi-Markov conditional random fields for the task of segmenting out sequences of words which might correspond to “named entities” such as company names or job titles.

Sarawagi’s research has had valuable practical applications such as the development of software for cleaning and structuring Indian addresses, as well as de-duplicating them.

First Reaction

Prof. Sunita Sarawagi reacts to winning the Infosys Prize

"Congratulations Prof. Sunita Sarawagi for winning the Infosys Prize in Engineering and Computer Science. Your pioneering research in using machine learning to analyze and understand unstructured data makes it possible to use the wealth of information in the worldwide web and other sources for the betterment of society and for creating new businesses. You richly deserve this award.”

Arvind

Jury Chair

Computer Science and Engineering

Engineering and Computer Science

Sunita Sarawagi

Infographic:

What are the chances?

Scope and Impact of Work

Bio

Timeline

1991-1996

B.Tech. (Computer Science), IIT-Kharagpur; M.S. (Computer Science), UC Berkeley; Ph.D. (Computer Science), UC Berkeley

1996

Works as IBM Almaden Research Center

1999

Appointed Assistant Professor at IIT-Bombay

2003

Wins IBM Faculty Award

2004

Co-develops the Semi-Markov Conditional Random Field (Semi-CRF) model with William Cohen at Carnegie Mellon University

2013

Fellow of the Indian National Academy of Engineering

2014

Appointed Visiting Scientist at Google Inc.

2017

Appointed Institute Chair Professor, Computer Science at IIT-Bombay

2019

Awarded the Infosys Prize in Computer Science and Engineering

Jury Citation

Arvind

Quick links

Be the first to get our event mailers and stay updated!

About the site

Connect with

The Prize

Events

Media room

Infosys Science Foundation