Keynote: Julia Vogt (ETH Zürich)
Title: Multimodal Machine Learning in Medicine
Abstract: In this talk, I will touch upon some of the challenges and chances that arise in the area of machine learning in medicine. I will put special emphasis on dealing with the multiple heterogeneous data types that naturally co-occur in medical practice. I will present different types of generative models for multimodal learning and demonstrate the need of multimodal modals in medicine on several medical application examples.

 

Authors: Federico Ravenda, Seyed Ali Bahrainian and Fabio Crestani
Title: A Lightweight Self-Supervised Approach to do Topic Modelling
Abstract: Topic models are powerful unsupervised tools that identify the most prevalent themes of a text corpus in different domains, from books to social media posts to newspapers and scientific articles. In this work, we propose a lightweight, self-supervised topic modelling framework that is able to outperform other state-of-the-art topic models. Our FasTM model leverages FastText embeddings to identify the most prominent keywords. It then uses a combination of these keywords and the originating document’s representation to learn topic representations.

 

Authors: Fynn Firouz Faber and Raphael Waltenspül
Title: Understanding Data Modeling Needs: Information Retrieval in the GLAM Sector
Abstract: The content in galleries, libraries, archives and museums (GLAM) typically exists within a rich context that is highly relevant for information retrieval. Taking a photograph in an archive as an example, metadata such as the location of the shoot or the name of the photographer is just as important as the image data itself. This metadata is often expensively annotated by hand, leaving much valuable data untapped. Considering the recent advances in machine learning (ML), the obvious question is how to automate this process. More holistically, we consider three tasks: 1. coordinating the ingestion of data and context information, 2. augmenting the data with additional information, 3. using all of the available information to satisfy an information need. To implement a system which executes these tasks, we must first decide on a data model which supports them. Here, we investigate howeach task imposes requirements on such a shared data model and share some preliminary investigations. We intend to open a discussion rather than sharing results.

 

Authors: Valentyna Pavliv and Isabel Wagner
Title: Dark Patterns in Smart Toys: automated discovery and remediation
Abstract: The new generation of connected toys, so-called smart toys, emerged from the evolution and democratisation of technologies like the Cloud or accurate AI models. Thanks to the synchronisation with a mobile application or even a direct network connection to remote servers, smart toys such as the Hello Barbie doll or the Cogni- Toys Dino can communicate with children in meaningful dialogues, becoming true friends – or maybe enemies. Privacy issues in smart toys are particularly important because children are among the most vulnerable members of society. In particular, children may not be able to give meaningful informed consent, which may invalidate the legal basis for data collection and processing by smart toys. In addition, as children trust their toy friends, they are particularly prone to manipulation by those. In this work, we analyze a particular kind of manipulation in smart toys: dark patterns. dark patterns are deceptive user interface designs that manipulate users to make choices they would not otherwise make, for example to influence purchasing or cookie acceptance decisions. We aim to contribute to state-of-the-art by analyzing the characteristics and prevalence of dark patterns designed especially for children.

 

Authors: Thomas Jakobsche and Florina M. Ciorba
Title: Bridging the Gap: From Queuing to Execution in High Performance Computing Systems
Abstract: High Performance Computing (HPC) systems are crucial for accelerating scientific discovery. Nevertheless, scheduling of HPC jobs is often sub-optimal, leading to resource underutilization and large job wait times. Although HPC job schedulers, such as Slurm, automatically allocate available resources to incoming jobs, users having to select appropriate queues to submit their jobs and to accurately estimate resource needs leads to a number of scheduling challenges. Queued jobs may wait unnecessarily in the queue due to mismatching queue configuration or overestimated resource requests. This leads to waste of HPC resources and slowing down of scientific discovery. This work seeks to understand the reasons behind job wait times, identify mismatching configurations between jobs and queues, and support users towards more accurate resource requests. While most related work aggregates all job waiting times, our novel approach splits wait time by specific wait reasons, offering insights often missed in prior research. Despite potential shortcomings in existing job schedulers, such as Slurm’s only storing the latest wait reason, and users’ hesitations in adjusting resource requests, this work aims to optimize job queue configurations and reduce job wait times. The ultimate goal is to enable more computations per second, thereby accelerating scientific discovery.

 

Authors: Martin Vahlensieck, Marco Vogt and Heiko Schuldt
Title: Towards Adaptive Data Management Using Containerization
Abstract: Polypheny is a PolyDBMS that facilitates integration with multiple query languages and data storage models by leveraging existing database systems. While it is possible to set up each of these databases on its own, we argue the advantages of automatic deployment based on containerization. We show how we improve the user experience, data migrations and security of Polypheny by means of automating these processes and share details about the implementation.