Search and EliminationI have searched for all possible technologies which can recognize gestures:
- Ultra-Sound: can be used, but in that level, the precision can't even reach in best cases less than 1cm.
- Electric Field Sensors: Are not precise too, can't detect non-conductive objects.
- Structured light: Kinect-like ? but kinect is less accurate, and not as responsive as the leap.
Tracking the tiny details
How have they managed to get all the surface of 3D objects ? Even the one which is not facing the sensor ?
The Missing Link !
Depth of scene from depth of field
"Along each geometric ray between the image plane and the lens, the image moves from being in relatively poor focus, to a point of best focus, and then back to being out of focus. Thus if we could trace along the path of each incoming ray to find the point of exact focus then we could recover the shape of the 3D world."
\(v_0\): distance between the lens and the image plane
Because differing aperture size causes differing focal errors, the same point will be focused differently in the two images. The critical fact is that the magnitude of this difference is a simple function of only one variable: the distance between the viewer and the imaged point. To obtain an estimate of depth, therefore, we need only compare corresponding points in the two images and measure this change in focus.
The difference in localized Fourier power is a monotonic increasing function of the blur in the second image. Or by the first equation, the distance to the imaged point is a monotonic decreasing function of the difference in the localized Fourier Power.
And as this post is not a formal scientific article, I won't put a lot of math, but instead the reference to them. You only need to retain that with some bricolage you can get depth from focus and defocus and to know in details how this can be made, you can start reading [Pentlend87,Pentlend89] and [Xiong93] then follow all the new work coming out of these papers.
How is this used in the leapmotion ?
The leapmotion uses ~3 cameras, each cam should see the same picture frame to remove the need of calibration as in Fig.4, so a basic system of mirrors and lenses is needed. As you can see here, the scene image enters the half-silvered mirrors system and is divided into 3 areas. Each one has a lens with different focal point. The resulting pictures transmitted by cameras are similar to the ones shown above in Fig.3 but simultaneously and in Real-Time.
Acceleration of the computation
The post-Leapmotion eraThe introduction of devices with such precision and accuracy and in the same time built on simple mathematical models makes a break at two levels:
- The way input should be handled in today's computers and operating systems
- The events and how to be routed inside apps, widgets, daemons.. (post-events abstraction era ?)