Finding the Signal in the Noise: Composite Biomarkers in Breath Analysis
Finding the Signal in the Noise: Composite Biomarkers in Breath Analysis
Breath is one of the most accessible and non-invasive biological matrices we have — rich in volatile organic compounds (VOCs) that reflect metabolic, inflammatory, and infectious processes. The potential is enormous: real-time diagnostics at the point-of-care, without needles or radiation. But breath is also noisy. Highly variable, easily influenced by external factors, and composed of thousands of compounds in minute concentrations, often present in parts-per-million or parts-per-billion concentrations. So how do we extract clinically useful information from such a turbulent medium?
At Breathomix, we believe the answer lies in composite biomarkers — not individual VOCs, but complex combinations that together form a disease-specific pattern.
Why Single VOCs Fall Short
While gas chromatography–mass spectrometry (GC-MS) and related techniques have helped identify individual VOCs linked to disease, the results are often inconsistent across studies. In most cases, no single molecule provides sufficient sensitivity or specificity to be used diagnostically. This is not surprising. Human biology — and disease — is rarely one-dimensional. Pathophysiological changes affect multiple systems and pathways, resulting in broad shifts in metabolic signatures rather than isolated changes in a single compound.
Moreover, the breath matrix itself is dynamic: affected by environmental exposure, diet, medication, comorbidities, and even circadian rhythm. Focusing on individual molecules in this context is like trying to diagnose a melody by measuring the loudness of a single note.
Composite Biomarkers: Pattern Recognition over Chemistry
Rather than isolating specific VOCs, our approach focuses on recognizing patterns in the VOC profile — statistical fingerprints that distinguish, for example, a lung cancer patient from a healthy individual. The SpiroNose®, our multi-sensor device, detects changes in VOC mixtures using metal oxide semiconductor (MOS) sensor arrays. Each sensor responds to a range of molecules with overlapping sensitivity, mimicking the human nose.
But just like the olfactory bulb requires the brain to interpret signals, our sensor data needs sophisticated algorithms to extract meaningful patterns. That’s where BreathBase®, our analysis platform, comes in.
From Raw Signal to Reliable Signature
Making sense of sensor output starts with rigorous signal processing and calibration. We apply:
- Factory calibration to ensure consistent sensor performance
- Cloud calibration to correct for subtle signal drift over time and devices
- Ambient correction to correct for environmental VOCs using dedicated ambient sensors
- Preprocessing for normalization and noise reduction
- Feature selection to retain variables with the most discriminative information
Then we enrich the signal with context. Each breath profile is linked to comprehensive clinical metadata — including comorbidities, medication, diet, recent smoking or alcohol use, blood values, CT imaging, and lung function tests.
This integrated approach has resulted in a unique database of over 170,000 breath profiles, each paired with structured clinical annotations. Such scale and depth allow us to identify robust composite biomarkers that are resilient to real-world variability.
Building and Validating Diagnostic Models
We never rely on a single algorithm. For each clinical question, we compare multiple modeling techniques — including LDA, PLS-DA, random forests, gradient boosting, and neural networks. This helps reduce the risk of overfitting and increases confidence in the generalizability of the resulting model.
The model development follows a strict pipeline:
- Training on well-characterized internal data
- Internal validation to test model robustness
- External validation in independent cohorts from other centers
- Implementation studies to assess clinical utility in daily practice
One example is our lung cancer detection model, which has already undergone successful external validation. The next steps include an implementation study and health technology assessment (HTA) to evaluate its value in clinical care pathways.
Decoding the Complexity of Breath
Breath analysis will never be about a single VOC — and it doesn’t need to be. By combining intelligent sensing, rigorous calibration, large-scale annotated datasets, and machine learning, we can translate noisy breath data into clear clinical signals.
The future of breathomics lies in scale — not just of sensors, but of data. Only with large and diverse datasets can we build robust, generalizable diagnostic models that stand up to real-world variability.
References
2. de Vries R, Sterk P.J. eNose breathprints as composite biomarker for real-time phenotyping of complex respiratory diseases. Journal of Allergy and Clinical Immunology (JACI). doi.org/10.1016/j.jaci.2020.07.022