Research

Semi-automatic Data Tours to Support Data Exploration and Visualisation Literacy for Novice Analysts

Benjamin Bach

In collaboration with:

Microsoft, Université Aix-en-Provence, University of Glasgow

1st July 2021 – 30th June 2023

Data analysis is crucial to understanding current phenomena from social media to climate change, from the human brain to migrations, from diseases to political conflicts. To complement statistical analysis and modern machine learning approaches for data analysis, visualisation techniques and interactive interfaces support human-in-the-loop control over these systems as well as human sensemaking in cases where data is uncertain, requires greater overview for the generation of hypotheses, and effective communication to larger audiences. While more and more tools, such as Tableau, Gephi or Microsoft’s PowerBI are democratising the use of data visualisation, using data visualisations to their full extend requires training novice analysts in tools, techniques, and interactive exploration, as well as communication and presentation. 

This project aims to free the analyst from their burden of exploring a data set from the beginning while having to select among tools, learn their workflows, and create visualisations themselves. Rather, it aims to support novice analysts through a system that automatically displays information about a data set to an analyst while explaining visualisation techniques and findings. In such a “data tour”, an analyst starts as a passive reader following a set of visualisations and textual explanations. Respective visualisations will be explained to the analyst. As the analyst familiarises themselves with visualisations and their data, they are invited to explore the data through an interactive interface and communicate the system in which aspects they are most interested in. 

Creating effective data tours draws inspiration from previous work on using comics for data-driven storytelling (datacomics.net), visualisation cheatsheets (visualizationcheatsheets.github.io) and approaches to data visualisation literacy, data mining for networks, and human-computer interaction. 

 

To provide for specific data sets and contact with novice analysts for evaluating our tool, this project involves collaborators in history, archaeology, sociology and network science and their complex geo-temporal networks including social networks, archaeological trading networks, family networks, and X (formerly Twitter) networks. 

To create compelling data tours for these data sets we lack significant understanding of 

  • current exploration strategies employed by analysts and their barriers to analysis, 
  • ways of automatically extracting and annotating patterns-of-interest in networks, and 
  • ways of creating meaningful explanatory sequences and high-level structures for data tours. 

Our research entails a comprehensive strategy that integrates field studies, visualisation and interface design, implementation, and user-centred evaluation. During a short first phase, we worked with experts in Humanities research to produce effective visualisations for their networks; in a second phase we mine and present insights from these data sets, and in the last phase, we investigate how to structure and present findings in data tours. 

This research opens new questions in how far storytelling and explaining visualisations can be supported by intelligent agents, i.e., computer programs, that partner with humans and engage in a dialogue. Our research may inspire new forms of intelligent interfaces that foresee an analyst’s tasks and understand their specific interest in the data. Researchers in the digital humanities, social sciences, and network analysis benefit from better support for visualising their geo-temporal networks and semi-automatic ways to analyse and lead to a better understanding of their data and new collaborative research agendas using visual analysis. Our project aims to provide impulses for commercial products and recommendation engines and will provide companies with knowledge and techniques to build customised data tours for their clients.