Knowledge Centre

Gather additional industry and technical information from our repository of resources and educational content

Revolutionizing the Generation of Patient Insights using Generative AI

In this webinar you can hear how Semalytix have deployed a cutting edge Patient Insights platform which leverages GenAI. AstraZeneca presents use cases of how they have successfully created insights and some of the challenges in adopting and driving use of Pharos & PatientGPT.
For more information – Request a demonstration!

Presenters: Donny Wong Ph.D. , Jamuna Chakravati, Prof. Dr. Philipp Cimiano, Janik Jaskolski (2025)

White Paper: The Semalytix Patient Listening Methodology

Authors: Prof. Dr. Philipp Cimiano, Dr. Thomas Andreu, Janik Jaskolski MSc (2025)

Download

Patient listening on social media for patient-focused drug development: a synthesis of considerations from patients, industry and regulators

Authors: Cimiano P, Collins B, De Vuono MC, Escudier T, Gottowik J, Hartung M, Leddin M, Neupane B, Rodriguez-Esteban R, Schmidt AL, Starke-Knäusel C, Voorhaar M, and Wieckowski K (2024)

Journal: Front. Med. 11: 1274688

Abstract: Reviews and discusses perspectives from relevant stakeholder groups (patients, industry, regulators) on current practice in social listening for PFDD, complemented by recommendations towards best practices.

Exploring the Perspectives of Patients Living With Lupus: Retrospective Social Listening Study

Authors: Spies E, Andreu T, Hartung M, Park J, Kamudoni P (2024)

Journal: In JMIR Form Res 8:e52768

Abstract: Full blown publication from the lupus project with Merk EMD

Using AI- based technology to gain insights from Osteoarthritis patients in UK via Social Listening

Authors: Neil Betteridge, Gudula Petersen, Thomas Andreu, Matthias Hartung (2023)

Journal: In Annals of the Rheumatic Diseases: Volume 82, Supp. 1, page 244

Abstract: Reports results from a social listening study on the patient experience with osteoarthritis with Grünenthal.

Download

Exploration of Melanoma Patient-Generated Real-World Data Using an AI Based Social Listening Approach

Authors: Tadmouri A, Alivon M, Andreu T, Hartung M, Ryll B, Rauch G, Kiecker F, Cimiano P (2022)

Journal: In Value in Health 25(12): S476-S477

Abstract: Reports results from a social listening study with Pierre Fabre in the indication metastatic melanoma.

Retrospective Social Listening Study of Patients Living with Systemic Lupus Erythematosus (SLE): Understanding the Patient Experience

Authors: Spies E, Andreu T, Koelling J, Hartung M, Kamudoni P, Park J (2022)

Journal: In Value in Health 25(12): S438

Abstract: Reports results from a social listening study with Merck/EMD in the indication systemic lupus erythematosus (SLE).

An Exploratory Retrospective Social Listening Study to Identify Patient Experiences Associated with Cutaneous Lupus Erythematosus (CLE)

Authors: Spies E, Andreu T, Koelling J, Hartung M, Kamudoni P, Park J

Journal: In Value in Health 25(12): S393

Abstract: Reports results from a social listening study with Merck/EMD in the indication cutaneous lupus erythematosus (CLE).

Continuous Post-Market Real World Evidence Generation from Online Drug Reviews using Natural Language Processing

Authors: Matthias Hartung, Arne KramerSunderbrink, Soufian Jebbara, Yannick Loonus, Bassam Mokbel, Philipp Cimiano

Journal: In Proceedings of IQWiG Information Retrieval Meeting (IRM), 2022.

Abstract: Short intro on generating real-world evidence about drug products post market entry from online drug reviews, based on the unpublished manuscript from Blum et al. (2021) below.

Download

Intensity Prediction over Health-related Quality-of-Life Variables Extracted from Self-reported Patient Narratives

Authors: Tanjeb Tawhid, Philipp Cimiano, Matthias Hartung

Journal: In Proceedings of the Healthcare Text Analytics Conference (HealTAC), 2021

Abstract: Presents different technical approaches towards determining the importance of a certain QoL facet (or analogously severity of a symptom) as perceived by patients.

Download

LLOD-driven Bilingual Word Embeddings Rivaling Crosslingual Transformers in Quality of Life Concept Detection from French Online Health Communities

Authors: Katharina Allgaier, Susana Veríssimo, Sherry Tan, Matthias Orlikowski, Matthias Hartung

Journal: In Alam, Mehwishet al. (eds.): Studies on the Semantic Web 53. Further with Knowledge Graphs. IOS Press: 89- 102, 2021. (Nominated for Best Paper Award at SEMANTiCS 2021.)

Abstract: Presents a technical approach for classifying patient narratives in languages other than English into QoL facets mentioned (demonstrated here for French; the same approach was even more successful for Simplified Chinese – results still unpublished).

Download

Automatically Analyzing Online Patient Experience Data with Natural Language Processing. An Instrument to Investigate Health Status and Help-Seeking Factors in Patients with Obesity

Authors: Matthias Hartung, Nathalie Schwering, Yannick Loonus, Philipp Cimiano, Anna Jäger, Ben Collins

Journal: In Qual Life Res 30(Suppl 1): S28, 2021

Abstract: Reports results from a patient experience study with Boehringer Ingelheim on patients with obesity, with a focus on patient-reported burden of disease and motivation for help-seeking.

Download

Extracting patient-reported outcomes and side effects from online drug reviews for real-world evidence generation

Authors: Moritz Blum, Matthias Hartung, Philipp Cimiano

Journal: Unpublished pre-print (2021)

Abstract: Technical report on online drug review mining including use case descriptions for comparative effectiveness and non-adherence.

Download

My Daughter Loves the New Pens Quantifying the Patient Experience with Machine Reading and Applied Semantic Computing

Authors: Bichteler, A.; Collins, B. G.; Walter, S.; Wendler, K.; Koelling, J.; Loonus, Y.; Hoewelkroeger, J.; Matheus, C.; Jebbara, S.; Hommel, F.; Badmaeva, E.; Verissimo, S.; Mokbel, B.; Cimiano, P.; Hartung, M.

Conference: ISPOR 2019

Abstract: Real-world experience of disease treatment lies at the heart of patient centricity. Conventional methods of developing patient-reported outcomes (PROs) instruments and value assessments are often costly, burdensome, even impossible (e.g. in orphan diseases; pediatrics). Our goal was to generate patient insights from online forums on 1) Lupus Nephritis (LN), and 2) subcutaneous treatments in Crohn’s, with the confidence necessary for decision-making.

Download

Learning Soft Domain Constraints in a Factor Graph Model for Template-based Information Extraction

Authors: Hendrik ter Horst, Matthias Hartung, Philipp Cimiano, Nicole Brazda, Hans Werner Müller, Roman Klinger

Journal: Data & Knowledge Engineering, Vol. 125, 101764

Abstract: The ability to accurately extract key information from textual documents is necessary in several downstream applications e.g., automatic knowledge base population from text, semantic information retrieval, question answering, or text summarization. However, information extraction (IE) systems are far from being errorless and in some cases commit errors that seem obvious to a human expert as they violate common sense or domain knowledge. Towards improving the performance of IE systems, we focus on the question of how domain knowledge can be incorporated into IE models to reduce the number of spurious extractions. Starting from the assumption that such domain knowledge cannot be incorporated explicitly and manually by domain experts due to the amount of effort and technical complexities involved, we propose a machine learning approach in which domain constraints are acquired as a byproduct of learning a model that learns to extract key information in a supervised setting. We frame the task as a template-based information extraction problem in which several dependent slots need to be automatically filled and propose a factor graph based approach to model the joint distribution of slot assignments given a text. Beyond using standard textual features in factors that score the compatibility of slot fillers in relation to the text, we use additional features that are text-independent and capture soft domain constraints. During the training process, these constraints receive a weight as part of the parameter learning process indicating how strongly a constraint should be enforced. These domain constraints are thus ‘soft’ in the sense that they can be violated, but the system learns to penalize solutions that violate them. The soft constraints we introduce come in two flavors: on the one hand we incorporate information about the mean of numerical attributes and use features that indicate how far a certain value is from the mean. We call these features single slot soft constraints. On the other hand, we model the pairwise compatibility between slot filler assignments independent of the textual context, thus modeling the (domain) compatibility of the slot assignments. We call the latter ones pairwise slot soft constraints. As main result of our work, we show that learning pairwise slot soft constraints improves the performance of our extraction model compared to single slot soft constraints by up to 6 points in F1 score, leading to F1=0.91 for individual template types. Further, the human readable output format of our model enables the extraction and interpretation of the learned soft constraints. Based on this, we show in an evaluation by domain experts that more than 68% of the learned soft constraints are regarded as plausible.

Extending Neural Question Answering with Linguistic Input Features

Authors: Fabian Hommel, Matthias Orlikowski, Philipp Cimiano, Matthias Hartung

Conference: SemDeep-5 co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019)

Abstract: Real-world experience of disease treatment lies at the heart of patient centricity. Conventional methods of developing patient-reported outcomes (PROs) instruments and value assessments are often costly, burdensome, even impossible (e.g. in orphan diseases; pediatrics). Our goal was to generate patient insights from online forums on 1) Lupus Nephritis (LN), and 2) subcutaneous treatments in Crohn’s, with the confidence necessary for decision-making.

Download

Zero-Shot Cross-Lingual Opinion Target Extraction

Authors: Soufian Jebbara, Philipp Cimiano

Conference: 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2019)

Abstract: Aspect-based sentiment analysis involves the recognition of so called opinion target expressions (OTEs). To automatically extract OTEs, supervised learning algorithms are usually employed which are trained on manually annotated corpora. The creation of these corpora is labor-intensive and sufficiently large datasets are therefore usually only available for a very narrow selection of languages and domains. In this work, we address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions. We leverage multilingual word embeddings that share a common vector space across various languages and incorporate these into a convolutional neural network architecture for OTE extraction. Our experiments with 5 languages give promising results: We can successfully train a model on annotated data of a source language and perform accurate prediction on a target language without ever using any annotated samples in that target language. Depending on the source and target language pairs, we reach performances in a zero-shot regime of up to 77% of a model trained on target language data. Furthermore, we can increase this performance up to 87% of a baseline model trained on target language data by performing cross-lingual learning from multiple source languages.

Download

A Guided Template-Based Question Answering System over Knowledge Graphs

Authors: Lukas Biermann, Sebastian Walter, and Philipp Cimiano

Conference: 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018)

Abstract: Considerable progress in neural question answering has been made on competitive general domain datasets. In order to explore methods to aid the generalization potential of question answering models, we reimplement a state-of-the-art architecture, perform a parameter search on an open-domain dataset and evaluate a first approach for integrating linguistic input features such as part-of-speech tags, syntactic dependency relations and semantic roles. The results show that adding these input features has a greater impact on performance than any of the architectural parameters we explore. Our findings suggest that these layers of linguistic knowledge have the potential to substantially increase the generalization capacities of neural QA models, thus facilitating cross-domain model transfer or the development of domain-agnostic QA models.

Download

Identifying Right-Wing Extremism in German Twitter Profiles: a Classification Approach

Authors: Matthias Hartung, Roman Klinger, Franziska Schmidtke, Lars Vogel

Conference: 22nd International Conference on Applications of Natural Language to Information Systems (NLDB 2017)

Abstract: Social media platforms are used by an increasing number of extremist political actors for mobilization, recruiting or radicalization purposes. We propose a machine learning approach to support manual monitoring aiming at identifying right-wing extremist content in German Twitter profiles. We frame the task as profile classification, based on textual cues, traits of emotionality in language use, and linguistic patterns. A quantitative evaluation reveals a limited precision of 25% with a close-to-perfect recall of 95%. This leads to a considerable reduction of the workload of human analysts in detecting right-wing extremist users.

Download

Opinion Mining in Online Reviews About Distance Education Programs

Authors: Janik Jaskolski, Fabian Siegberg, Thomas Tibroni, Philipp Cimiano, Roman Klinger

Journal Publication

Abstract: The popularity of distance education programs is increasing at a fast pace. En par with this development, online communication in fora, social media and reviewing platforms between students is increasing as well. Exploiting this information to support fellow students or institutions requires to extract the relevant opinions in order to automatically generate reports providing an overview of pros and cons of different distance education programs. We report on an experiment involving distance education experts with the goal to develop a dataset of reviews annotated with relevant categories and aspects in each category discussed in the specific review together with an indication of the sentiment. Based on this experiment, we present an approach to extract general categories and specific aspects under discussion in a review together with their sentiment. We frame this task as a multi-label hierarchical text classification problem and empirically investigate the performance of different classification architectures to couple the prediction of a category with the prediction of particular aspects in this category. We evaluate different architectures and show that a hierarchical approach leads to superior results in comparison to a flat model which makes decisions independently.

Download

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Analytics" category.
cookielawinfo-checkbox-functional	1 year	The GDPR Cookie Consent plugin sets the cookie to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Necessary" category.
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie stores user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie stores the user consent for cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
PHPSESSID	session	This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed.
viewed_cookie_policy	1 year	The GDPR Cookie Consent plugin sets the cookie to store whether or not the user has consented to use cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.
__cf_bm	15 minutes	Third party Vimeo video service. We use it to embed videos on our page.
_cfuvid	session	Third party Vimeo video service. We use it to embed videos on our page.
player	1 year	Third party Vimeo video service. We use it to embed videos on our page.
vuid	1 year	Third party Vimeo video service. We use it to embed videos on our page.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.