State-of-the-art technology

We work closely together with various universities and institutions in order to shape the bleeding edge of technology!
Here, you will find our latest scientific publications

"

My Daughter Loves the New Pens Quantifying the Patient Experience with Machine Reading and Applied Semantic Computing

Authors: Bichteler, A.; Collins, B. G.; Walter, S.; Wendler, K.; Koelling, J.; Loonus, Y.; Hoewelkroeger, J.; Matheus, C.; Jebbara, S.; Hommel, F.; Badmaeva, E.; Verissimo, S.; Mokbel, B.; Cimiano, P.; Hartung, M.

Conference: ISPOR 2019

Abstract: Real-world experience of disease treatment lies at the heart of patient centricity. Conventional methods of developing patient-reported outcomes (PROs) instruments and value assessments are often costly, burdensome, even impossible (e.g. in orphan diseases; pediatrics). Our goal was to generate patient insights from online forums on 1) Lupus Nephritis (LN), and 2) subcutaneous treatments in Crohn’s, with the confidence necessary for decision-making.

Download

Learning Soft Domain Constraints in a Factor Graph Model for Template-based Information Extraction

Authors: Hendrik ter Horst, Matthias Hartung, Philipp Cimiano, Nicole Brazda, Hans Werner Müller, Roman Klinger

Journal: Data & Knowledge Engineering, Vol. 125, 101764

Abstract: The ability to accurately extract key information from textual documents is necessary in several downstream applications e.g., automatic knowledge base population from text, semantic information retrieval, question answering, or text summarization. However, information extraction (IE) systems are far from being errorless and in some cases commit errors that seem obvious to a human expert as they violate common sense or domain knowledge. Towards improving the performance of IE systems, we focus on the question of how domain knowledge can be incorporated into IE models to reduce the number of spurious extractions. Starting from the assumption that such domain knowledge cannot be incorporated explicitly and manually by domain experts due to the amount of effort and technical complexities involved, we propose a machine learning approach in which domain constraints are acquired as a byproduct of learning a model that learns to extract key information in a supervised setting. We frame the task as a template-based information extraction problem in which several dependent slots need to be automatically filled and propose a factor graph based approach to model the joint distribution of slot assignments given a text. Beyond using standard textual features in factors that score the compatibility of slot fillers in relation to the text, we use additional features that are text-independent and capture soft domain constraints. During the training process, these constraints receive a weight as part of the parameter learning process indicating how strongly a constraint should be enforced. These domain constraints are thus ‘soft’ in the sense that they can be violated, but the system learns to penalize solutions that violate them. The soft constraints we introduce come in two flavors: on the one hand we incorporate information about the mean of numerical attributes and use features that indicate how far a certain value is from the mean. We call these features single slot soft constraints. On the other hand, we model the pairwise compatibility between slot filler assignments independent of the textual context, thus modeling the (domain) compatibility of the slot assignments. We call the latter ones pairwise slot soft constraints. As main result of our work, we show that learning pairwise slot soft constraints improves the performance of our extraction model compared to single slot soft constraints by up to 6 points in F1 score, leading to F1=0.91 for individual template types. Further, the human readable output format of our model enables the extraction and interpretation of the learned soft constraints. Based on this, we show in an evaluation by domain experts that more than 68% of the learned soft constraints are regarded as plausible.

Extending Neural Question Answering with Linguistic Input Features

Authors: Fabian Hommel, Matthias Orlikowski, Philipp Cimiano, Matthias Hartung

Conference: SemDeep-5 co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019)

Abstract: Real-world experience of disease treatment lies at the heart of patient centricity. Conventional methods of developing patient-reported outcomes (PROs) instruments and value assessments are often costly, burdensome, even impossible (e.g. in orphan diseases; pediatrics). Our goal was to generate patient insights from online forums on 1) Lupus Nephritis (LN), and 2) subcutaneous treatments in Crohn’s, with the confidence necessary for decision-making.

Download

Zero-Shot Cross-Lingual Opinion Target Extraction

Authors: Soufian Jebbara, Philipp Cimiano

Conference: 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2019)

Abstract: Aspect-based sentiment analysis involves the recognition of so called opinion target expressions (OTEs). To automatically extract OTEs, supervised learning algorithms are usually employed which are trained on manually annotated corpora. The creation of these corpora is labor-intensive and sufficiently large datasets are therefore usually only available for a very narrow selection of languages and domains. In this work, we address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions. We leverage multilingual word embeddings that share a common vector space across various languages and incorporate these into a convolutional neural network architecture for OTE extraction. Our experiments with 5 languages give promising results: We can successfully train a model on annotated data of a source language and perform accurate prediction on a target language without ever using any annotated samples in that target language. Depending on the source and target language pairs, we reach performances in a zero-shot regime of up to 77% of a model trained on target language data. Furthermore, we can increase this performance up to 87% of a baseline model trained on target language data by performing cross-lingual learning from multiple source languages.

Download

A Guided Template-Based Question Answering System over Knowledge Graphs

Authors: Lukas Biermann, Sebastian Walter, and Philipp Cimiano

Conference: 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018)

Abstract: Considerable progress in neural question answering has been made on competitive general domain datasets. In order to explore methods to aid the generalization potential of question answering models, we reimplement a state-of-the-art architecture, perform a parameter search on an open-domain dataset and evaluate a first approach for integrating linguistic input features such as part-of-speech tags, syntactic dependency relations and semantic roles. The results show that adding these input features has a greater impact on performance than any of the architectural parameters we explore. Our findings suggest that these layers of linguistic knowledge have the potential to substantially increase the generalization capacities of neural QA models, thus facilitating cross-domain model transfer or the development of domain-agnostic QA models.

Download

Identifying Right-Wing Extremism in German Twitter Profiles: a Classification Approach

Authors: Matthias Hartung, Roman Klinger, Franziska Schmidtke, Lars Vogel

Conference: 22nd International Conference on Applications of Natural Language to Information Systems (NLDB 2017)

Abstract: Social media platforms are used by an increasing number of extremist political actors for mobilization, recruiting or radicalization purposes. We propose a machine learning approach to support manual monitoring aiming at identifying right-wing extremist content in German Twitter profiles. We frame the task as profile classification, based on textual cues, traits of emotionality in language use, and linguistic patterns. A quantitative evaluation reveals a limited precision of 25% with a close-to-perfect recall of 95%. This leads to a considerable reduction of the workload of human analysts in detecting right-wing extremist users.

Download

Opinion Mining in Online Reviews About Distance Education Programs

Authors: Janik Jaskolski, Fabian Siegberg, Thomas Tibroni, Philipp Cimiano, Roman Klinger

Journal Publication

Abstract: The popularity of distance education programs is increasing at a fast pace. En par with this development, online communication in fora, social media and reviewing platforms between students is increasing as well. Exploiting this information to support fellow students or institutions requires to extract the relevant opinions in order to automatically generate reports providing an overview of pros and cons of different distance education programs. We report on an experiment involving distance education experts with the goal to develop a dataset of reviews annotated with relevant categories and aspects in each category discussed in the specific review together with an indication of the sentiment. Based on this experiment, we present an approach to extract general categories and specific aspects under discussion in a review together with their sentiment. We frame this task as a multi-label hierarchical text classification problem and empirically investigate the performance of different classification architectures to couple the prediction of a category with the prediction of particular aspects in this category. We evaluate different architectures and show that a hierarchical approach leads to superior results in comparison to a flat model which makes decisions independently.

Download

Request a demo

Name*
First Name Last Name
Company Name*
Email*
Message*
*
- You hereby consent to the processing of your personal data for the purpose of contacting you and agree that your data may be transferred to the USA or third countries in which the same level of data protection as in the EU cannot be guaranteed. There is a risk that non-European authorities as well as third parties may gain access to your data. We have no knowledge of which authorities and third parties could access your data, and how long, where, and for what purpose your data could be used. You can revoke your consent at any time. For more information, please see our Data Protection Policy..
Phone
This field is for validation purposes and should be left unchanged.

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Analytics" category.
cookielawinfo-checkbox-functional	1 year	The GDPR Cookie Consent plugin sets the cookie to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Necessary" category.
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie stores user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie stores the user consent for cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
PHPSESSID	session	This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed.
viewed_cookie_policy	1 year	The GDPR Cookie Consent plugin sets the cookie to store whether or not the user has consented to use cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.
__cf_bm	15 minutes	Third party Vimeo video service. We use it to embed videos on our page.
_cfuvid	session	Third party Vimeo video service. We use it to embed videos on our page.
player	1 year	Third party Vimeo video service. We use it to embed videos on our page.
vuid	1 year	Third party Vimeo video service. We use it to embed videos on our page.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.