State-of-the-art technology
We work closely together with various universities and institutions in order to shape the bleeding edge of technology!
Here, you will find our latest scientific publications
My Daughter Loves the New Pens Quantifying the Patient Experience with Machine Reading and Applied Semantic Computing
Authors: Bichteler, A.; Collins, B. G.; Walter, S.; Wendler, K.; Koelling, J.; Loonus, Y.; Hoewelkroeger, J.; Matheus, C.; Jebbara, S.; Hommel, F.; Badmaeva, E.; Verissimo, S.; Mokbel, B.; Cimiano, P.; Hartung, M.
Conference: ISPOR 2019
Abstract: Real-world experience of disease treatment lies at the heart of patient centricity. Conventional methods of developing patient-reported outcomes (PROs) instruments and value assessments are often costly, burdensome, even impossible (e.g. in orphan diseases; pediatrics). Our goal was to generate patient insights from online forums on 1) Lupus Nephritis (LN), and 2) subcutaneous treatments in Crohn’s, with the confidence necessary for decision-making.
Learning Soft Domain Constraints in a Factor Graph Model for Template-based Information Extraction
Authors: Hendrik ter Horst, Matthias Hartung, Philipp Cimiano, Nicole Brazda, Hans Werner Müller, Roman Klinger
Journal: Data & Knowledge Engineering, Vol. 125, 101764
Abstract: The ability to accurately extract key information from textual documents is necessary in several downstream applications e.g., automatic knowledge base population from text, semantic information retrieval, question answering, or text summarization. However, information extraction (IE) systems are far from being errorless and in some cases commit errors that seem obvious to a human expert as they violate common sense or domain knowledge. Towards improving the performance of IE systems, we focus on the question of how domain knowledge can be incorporated into IE models to reduce the number of spurious extractions. Starting from the assumption that such domain knowledge cannot be incorporated explicitly and manually by domain experts due to the amount of effort and technical complexities involved, we propose a machine learning approach in which domain constraints are acquired as a byproduct of learning a model that learns to extract key information in a supervised setting. We frame the task as a template-based information extraction problem in which several dependent slots need to be automatically filled and propose a factor graph based approach to model the joint distribution of slot assignments given a text. Beyond using standard textual features in factors that score the compatibility of slot fillers in relation to the text, we use additional features that are text-independent and capture soft domain constraints. During the training process, these constraints receive a weight as part of the parameter learning process indicating how strongly a constraint should be enforced. These domain constraints are thus ‘soft’ in the sense that they can be violated, but the system learns to penalize solutions that violate them. The soft constraints we introduce come in two flavors: on the one hand we incorporate information about the mean of numerical attributes and use features that indicate how far a certain value is from the mean. We call these features single slot soft constraints. On the other hand, we model the pairwise compatibility between slot filler assignments independent of the textual context, thus modeling the (domain) compatibility of the slot assignments. We call the latter ones pairwise slot soft constraints. As main result of our work, we show that learning pairwise slot soft constraints improves the performance of our extraction model compared to single slot soft constraints by up to 6 points in F1 score, leading to F1=0.91 for individual template types. Further, the human readable output format of our model enables the extraction and interpretation of the learned soft constraints. Based on this, we show in an evaluation by domain experts that more than 68% of the learned soft constraints are regarded as plausible.
Extending Neural Question Answering with Linguistic Input Features
Authors: Fabian Hommel, Matthias Orlikowski, Philipp Cimiano, Matthias Hartung
Conference: SemDeep-5 co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019)
Abstract: Real-world experience of disease treatment lies at the heart of patient centricity. Conventional methods of developing patient-reported outcomes (PROs) instruments and value assessments are often costly, burdensome, even impossible (e.g. in orphan diseases; pediatrics). Our goal was to generate patient insights from online forums on 1) Lupus Nephritis (LN), and 2) subcutaneous treatments in Crohn’s, with the confidence necessary for decision-making.
Zero-Shot Cross-Lingual Opinion Target Extraction
Authors: Soufian Jebbara, Philipp Cimiano
Abstract: Aspect-based sentiment analysis involves the recognition of so called opinion target expressions (OTEs). To automatically extract OTEs, supervised learning algorithms are usually employed which are trained on manually annotated corpora. The creation of these corpora is labor-intensive and sufficiently large datasets are therefore usually only available for a very narrow selection of languages and domains. In this work, we address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions. We leverage multilingual word embeddings that share a common vector space across various languages and incorporate these into a convolutional neural network architecture for OTE extraction. Our experiments with 5 languages give promising results: We can successfully train a model on annotated data of a source language and perform accurate prediction on a target language without ever using any annotated samples in that target language. Depending on the source and target language pairs, we reach performances in a zero-shot regime of up to 77% of a model trained on target language data. Furthermore, we can increase this performance up to 87% of a baseline model trained on target language data by performing cross-lingual learning from multiple source languages.
A Guided Template-Based Question Answering System over Knowledge Graphs
Authors: Lukas Biermann, Sebastian Walter, and Philipp Cimiano
Conference: 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018)
Abstract: Considerable progress in neural question answering has been made on competitive general domain datasets. In order to explore methods to aid the generalization potential of question answering models, we reimplement a state-of-the-art architecture, perform a parameter search on an open-domain dataset and evaluate a first approach for integrating linguistic input features such as part-of-speech tags, syntactic dependency relations and semantic roles. The results show that adding these input features has a greater impact on performance than any of the architectural parameters we explore. Our findings suggest that these layers of linguistic knowledge have the potential to substantially increase the generalization capacities of neural QA models, thus facilitating cross-domain model transfer or the development of domain-agnostic QA models.
Identifying Right-Wing Extremism in German Twitter Profiles: a Classification Approach
Authors: Matthias Hartung, Roman Klinger, Franziska Schmidtke, Lars Vogel
Conference: 22nd International Conference on Applications of Natural Language to Information Systems (NLDB 2017)
Abstract: Social media platforms are used by an increasing number of extremist political actors for mobilization, recruiting or radicalization purposes. We propose a machine learning approach to support manual monitoring aiming at identifying right-wing extremist content in German Twitter profiles. We frame the task as profile classification, based on textual cues, traits of emotionality in language use, and linguistic patterns. A quantitative evaluation reveals a limited precision of 25% with a close-to-perfect recall of 95%. This leads to a considerable reduction of the workload of human analysts in detecting right-wing extremist users.
Opinion Mining in Online Reviews About Distance Education Programs
Authors: Janik Jaskolski, Fabian Siegberg, Thomas Tibroni, Philipp Cimiano, Roman Klinger
Journal Publication
Abstract: The popularity of distance education programs is increasing at a fast pace. En par with this development, online communication in fora, social media and reviewing platforms between students is increasing as well. Exploiting this information to support fellow students or institutions requires to extract the relevant opinions in order to automatically generate reports providing an overview of pros and cons of different distance education programs. We report on an experiment involving distance education experts with the goal to develop a dataset of reviews annotated with relevant categories and aspects in each category discussed in the specific review together with an indication of the sentiment. Based on this experiment, we present an approach to extract general categories and specific aspects under discussion in a review together with their sentiment. We frame this task as a multi-label hierarchical text classification problem and empirically investigate the performance of different classification architectures to couple the prediction of a category with the prediction of particular aspects in this category. We evaluate different architectures and show that a hierarchical approach leads to superior results in comparison to a flat model which makes decisions independently.