Adding perspective with face tracking

One of the most immersive applications we have played with is Let’s create! Pottery; it’s a zen-type game that allows its users to create pottery in an incredibly intuitive way.


One feature that contributes to making this experience immersive is its responsive background i.e. as the user pivots the device around the background slightly pans as a response, similar to a parallax effect in that it gives the user the feeling of depth and realism.

I imagine the mechanic used to implement this is using the accelerometer and/or gyroscope, this enables you to detect motion and orientation of the device thus allowing you to adjust the scene accordingly.

In this experiment we took this a little step further. Rather than relying on the device being moved we ‘watch’ the users face and adjusted the virtual camera based on the where the face was in correlation to the screen – providing the illusion of being able to ‘peek’ around the scene.

Face Tracking

You have a few options when implementing this; the first that came to mind was using one of the algorithms implemented in OpenCV but after a little research I discovered that this functionality is readily available on both iOS and Android and most probably hardware accelerated (which makes sense considering one of the most popular functions of a Smartphone is taking photos).

Android provides 2 paths; one using the package that provides you a class (FaceDetector) to query a image for faces. The other, and most desirable, is using the package android.hardware.Camera.Face where you ask the camera to provide you (via the FaceDetectionListener) a set of faces for each frame.

For iOS; around 3.2 a framework called CoreImage was included which provides a way to efficiently work with images in real-time (i.e. taking advantage of the GPU). One class that is bundled with the CoreImage framework is the Face detector which provides a way of quickly detecting faces from an image.

Once we have detected the users face it’s just a matter of normalising the position and translating the camera accordingly.

If you’re curious then check out the code on Github.