Case Study

sleep² in the Great Sleep Tracker Comparison – Accuracy on Par with Polysomnography

How accurate are sleep trackers really?

Sleep trackers and wearables are now indispensable in everyday life and research. But how reliable are the sleep data from Oura, Apple Watch, Fitbit, and others actually? A recent study (Topalidis et al., 2025) systematically investigated this question and tested eight widespread consumer sleep trackers (CST) under real conditions against polysomnography (PSG) – the EEG-based gold standard procedure in sleep medicine. For sleep², the result is clear: No other tested method measures the sleep stages as accurately.

The study design: five nights, home PSG, targeted stress tests

Eighteen participants completed five consecutive nights (Monday to Friday) with ambulatory home polysomnography while simultaneously wearing two identical devices of each tracker. To test the algorithms under challenging conditions, the protocol included targeted sleep manipulations, such as shortened and extended sleep. The evaluation followed a standardized framework of epoch-by-epoch analysis (30-second segments) and a discrepancy analysis of key sleep parameters.

Tested were: sleep² (with Polar Verity Sense and Polar H10), Oura Ring 3, Apple Watch Series 9, Fitbit Charge 6, Garmin Vivoactive 6 and Venu 3, WHOOP 4, and Circul+.

The results at a glance

Measured was the agreement with the PSG across all sleep stages, expressed as accuracy and via Cohen's κ, which accounts for random agreement.

Device	Accuracy	Cohen's κ
sleep² (Polar H10)	84.0 %	0.76
sleep² (Polar Verity Sense)	83.7 %	0.76
Oura Ring 3	72.5 %	0.59
Apple Watch Series 9	72.3 %	0.56
Fitbit Charge 6	66.2 %	0.47
WHOOP 4	65.2 %	0.48
Garmin Vivoactive 6 and Venu 3	63.4 %	0.41
Circul+	55.6 %	0.33

Epoch-by-epoch accuracy and Cohen's κ compared to polysomnography. Note: The maximum achievable accuracy is around 88% (interrater reliability with PSG). Source: Topalidis et al. (2025).

Test sleep² now and learn to sleep better!

To the sleep² app

What the numbers mean

With a Cohen's κ of 0.76, sleep² achieves substantial agreement with the PSG and is the only system in the test field to reach this level. In contrast, most wrist-worn trackers overestimated total sleep time and massively underestimated wake phases after sleep onset (WASO). This effect was particularly evident on atypical nights with fragmented, shortened, or extended sleep – precisely when accurate data is most important.

The cardiac-based sleep² measurements using arm and chest bands showed only minor deviations from the PSG and remained stable even on challenging nights. The Oura devices and Apple Watch (Series 9) achieved moderately good accuracy but showed considerable variations between nights.

Why sleep² measures so accurately

The difference lies in the sensors and the deep-learning AI method. While many wearables derive their sleep stages from motion data, sleep² uses measurements near the heart and measures the heartbeat with millisecond precision via inter-beat intervals (IBI). This signal precisely depicts the nocturnal regulation of the autonomic nervous system and makes the measurement robust – even when sleep is restless or the bed partner moves next to you.

Recommendation

To reliably capture sleep, one should rely on validated methods tested against PSG. Accuracy relative to the gold standard is the critical benchmark.
Interpret single-night values from wrist trackers with caution, especially during restless or unusually short or long sleep.
For research and care, a standardized, IBI-based method with near-heart sensors like sleep² is recommended.

Sources:

Topalidis, P., Kogler, L., Mitterer, C., Hinterberger, A., Baron, S., Schabus, M., & ter Horst, R. (2025). Beyond the Hype? A Standardised Real-World Evaluation of Consumer Sleep Trackers (CST) in Extracting Sleep. PsyArXiv. https://doi.org/10.31234/osf.io/27wun_v1

More Case Studies

sleep² x Techniker Krankenkasse: Online Sleep Training for up to 12 Million Insured

The Techniker Krankenkasse (TK) makes sleep² its exclusive digital sleep offering for the indication insomnia - free of charge for a whole year for its insured. The TK project represents both pillars of care: prevention for the broad insured base and therapy via the sleep² intensive program certified as a medical device.

Read Case Study

Sleep Monitoring in Special Operations Forces: 23,000 Nights with sleep²

In the HP³ project, the sleep of special operations forces was recorded over 23,000 nights using an HRV wearable and sleep² – daily feedback and CBT-I recommendations measurably improved sleep quality, efficiency, and sleep onset duration.

Read Case Study