Human-guided data analysis (Ihmisen ohjaama tiedonhaku) is an Academy Project at the Finnish Institute of Occupational Health, funded by the Academy of Finland, for the period from September 2015 to August 2019.

The current methods and processes of data analysis give the knowledge workers, who are rarely experts in data analysis, only a limited means to explore large heterogeneous data sets. We further develop and study the recently introduced formulation of the explorative data analysis task in terms of statistical significance testing and constraints to null hypothesis to develop novel methods of data analysis that are optimised for the use with humans and that can be controlled by the humans. We plan to demonstrate the methods by prototype systems that can be applied, e.g., to data sets collected at the Finnish Institute of Occupational Health.

People

  • Dr Kai Puolamäki, principal investigator
  • Dr Emilia Oikarinen
  • Dr Andreas Henelius
  • Dr Virpi Kalakoski
  • Dr Antti Ukkonen (currently at University of Helsinki)

Selected publications

  • Andreas Henelius, Kai Puolamäki, Antti Ukkonen. Interpreting Classifiers through Attribute Interactions in Datasets. In Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), pp. 8-13, 2017. Proceedings at https://arxiv.org/abs/1708.02666, paper at http://arxiv.org/abs/1707.07576
  • Jeremias Berg, Emilia Oikarinen, Matti Järvisalo, Kai Puolamäki. Minimum-Width Confidence Bands via Constraint Optimization. In Proc of the 23rd International Conference on Principles and Practice of Constraint Programming (CP 2017), to appear.
  • Jussi Korpela, Emilia Oikarinen, Kai Puolamäki. Multivariate Confidence Intervals. In Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 696-704, 2017. doi: 10.1137/1.9781611974973.78 Extended version is available at https://arxiv.org/abs/1701.05763
  • Andreas Henelius, Isak Karlsson, Panagiotis Papapetrou, Antti Ukkonen, and Kai Puolamäki. Semigeometric Tiling of Event Sequences. ECML PKDD 2016, Part I, LNCS 9851, pp. 329-344, 2016. doi: 10.1007/978-3-319-46128-1_21
  • Kai Puolamäki, Bo Kang, Jefrey Lijffijt, Tijl De Bie. Interactive Visual Data Exploration with Subjective Feedback. ECML PKDD 2016, Part II, LNCS 9852, pp. 214-229, 2016. doi: 10.1007/978-3-319-46227-1_14
  • Bo Kang, Kai Puolamäki, Jefrey Lijffijt, Tijl De Bie. A Tool for Subjective and Interactive Visual Data Exploration. ECML PKDD 2016, Part III, LNCS 9853, pp. 3-7, 2016. doi: 10.1007/978-3-319-46131-1_1

Nothing — not the careful logic of mathematics, not statistical models and theories, not the awesome arithmetic power of modern computers — nothing can substitute here for the flexibility of the informed human mind… Accordingly, both [analysis] approaches and techniques need to be structured so as to facilitate human involvement and intervention.
– John W. Tukey & Martin B. Wilk, Data Analysis & Statistics, 1966

(We are always looking for good people, please contact me if interested!)