Expressive Input

WatchHand (CHI 26)


Posted on Feb. 9, 2026, 3:59 p.m.


WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches


Jiwan Kim*, Chi-Jung Lee*, Hohurn Jung, Tianhong Catherine Yu, Ruidong Zhang, Ian Oakley†, and Cheng Zhang†
CHI '26: Proceedings of the 2026 CHI conference on Human Factors in Computing systems.
DOI: https://dl.acm.org/doi/10.1145/3772318.3790932
Dataset: https://github.com/witlab-kaist/WatchHand


Abstract

Tracking hand poses on wrist-wearables enables rich, expressive interactions, yet remains unavailable on commercial smartwatches, as prior implementations rely on external sensors or custom hardware, limiting their real-world applicability. To address this, we present WatchHand, the first continuous 3D hand pose tracking system implemented on off-the-shelf smartwatches using only their built-in speaker and microphone. WatchHand emits inaudible frequency-modulated continuous waves and captures their reflections from the hand. These acoustic signals are processed by a deep-learning model that estimates 3D hand poses for 20 finger joints. We evaluate WatchHand across diverse real-world conditions---multiple smartwatch models, wearing-hands, body postures, noise conditions, pose-variation protocols---and achieve a mean per-joint position error of 7.87 mm in cross-session tests with device remounting. Although performance drops for unseen users or gestures, the model adapts effectively with lightweight fine-tuning on small amounts of data. Overall, WatchHand lowers the barrier to smartwatch-based hand tracking by eliminating additional hardware while enabling robust, always-available interactions on millions of existing devices.


Short Summary Video

To appear


BibTex

To appear