A touchless calculator that lets you build and evaluate math expressions using hand gestures captured via webcam. Novel human-computer interaction for sterile, accessibility, and educational environments.
Traditional calculator interfaces require physical contact with keyboards, mice, or touchscreens. In medical environments (operating rooms, labs), educational settings (classrooms), and accessibility contexts (users with motor impairments), touchless interaction offers significant value. However, existing touchless input methods are often expensive, unreliable, or require specialized hardware.
A real-time computer vision application that uses a standard webcam to track hand landmarks via MediaPipe, maps finger counts to digits (0-9) and operators (+, -, *, /, =), and evaluates mathematical expressions as they are built. No specialized hardware required — just a webcam and standard Python environment.
MediaPipe Hands detects 21 3D landmarks per hand in real-time. The pipeline processes each webcam frame, identifies hand presence, and extracts landmark coordinates for finger counting.
One hand: number of extended fingers maps to digits 0-5. Two hands: combined fingers map to digits 6-9 and operators. Thumb-up gesture triggers evaluation, open-palm clears the expression.
A 1.25-second cooldown prevents duplicate inputs from gesture hold. The system only registers new input when fingers change state and the cooldown has elapsed.
Expressions are built character-by-character. When the user signals "evaluate," the expression string is parsed and computed using Python's eval with restricted scope for safety.
Core application language
Camera capture, frame processing, and visual overlay rendering
Real-time hand landmark detection (21 landmarks per hand)
Numerical operations and landmark coordinate processing
Gesture debouncing: Without debouncing, the system registers hundreds of inputs per second as the user holds a gesture steady.
Implemented a 1.25-second cooldown timer. Inputs are only registered when the gesture changes AND the timer has expired. This provides a natural input rhythm.
Lighting sensitivity: MediaPipe hand detection degrades significantly in low-light or uneven lighting conditions.
Applied adaptive histogram equalization to input frames before passing to MediaPipe. Added user guidance text about lighting conditions in the UI overlay.
Multi-hand ambiguity: When both hands are in frame, determining which hand corresponds to which intended input is ambiguous.
Assigned left hand to digit selection (0-5) and right hand to operator selection. Clear visual feedback shows which hand is currently active with color-coded landmarks.
Real-time CV applications require careful latency management — frame processing time directly impacts user experience. MediaPipe's 30ms per-frame inference was acceptable but left little headroom for additional processing. The debouncing interval was tuned experimentally: 1.25s felt natural for deliberate input but would need reduction for power users. Cross-platform testing revealed significant variance in webcam FPS (15-60 FPS depending on hardware), requiring adaptive frame skipping.
Allow users to define custom gesture mappings and create personalized control schemes.
Port to mobile devices using the front-facing camera for on-the-go touchless calculation.
Design gesture sets specifically for users with limited hand mobility, using larger gestures and longer debounce windows.
Extend beyond basic arithmetic to support trigonometric functions, logarithms, and calculus operations via gesture menus.