The Human Data Interaction project at MIT aims to remove the costly bottlenecks in deriving predictive models from raw data. Over the past 5 years, we have developed technologies that enable automatic feature extraction from temporal, relational datasets (featuretools); auto-tune machine learning pipelines (deep mining); and gather building blocks for data transformations designed by humans based on their intuition and expertise. After creating fully automated systems to create predictive models from raw data, our team started focusing on how do humans describe and design prediction problems. In other words, how do the formulate their predictive inquiries? To this end, we designed a language, called TRANE, for describing prediction problems over relational datasets, as well as implemented a system that allows data scientists to specify problems in that language. We show that this language is able to describe prediction problems across many different domains, including those on KAGGLE- a data science competition website. We express 29 different KAGGLE problems in this language.

Publications

What would a data scientist ask? Automatically formulating and solving prediction problems (PDF)
Benjamin Schreck, Kalyan Veeramachaneni IEEE International Conference on Data Science andAdvance Analytics Montreal, CA. October, 2016.

Towards An Automatic Predictive Question Formulation (PDF)
Benjamin J. Schreck, M.E. thesis, MIT Dept of EECS, June 2016. Advisor: Kalyan Veeramachaneni.

Contributors

Alex Nordin

Benjamin Schreck (2015-16)