Facial tracking for character performance

edited March 2015 in General
In a conversation on another thread David Boccabella mentioned his research into facial tracking, and I though the subject was so great that it warrants its own thread.
David Boccabella:

I did an experiment for a facial waldo.. Precision pot's were horribly expensive especially linear ones. When I finally got some I found that pressure needed to move them was far above say the pressure a lip can give.
However I did have some very cheap 9gram servos that I got in bulk from eBay. So stripping the motors and most of the gears out of them and putting in a longer horn - I have a 'linear' style servo that almost moved when one breathed on it.
Was very easy to do the prototype waldo then.
Current experiment is using EMG to pick up facial muscle movements :)
With complex challenges like facial tracking it's always great to tackle them from as many perspectives as possible, so if any of you have ideas, please share!

I have no experience with EMG (
Electromyography), so I can't wait to see David's results.  I do have experience with optical tracking and microcontrollers, so I figured I'd see (at least academically) how I might approach the challenge using off the shelf components for optical tracking.

The problem with optical tracking is that you typically need a full computer to handle all the processing required to track multiple markers.  That task would overwhelm a single microcontroller.  That folks over at Charmed Labs have come up with a novel solution to this problem.  Their product Pixy is an optical tracking camera with an onboard microcontroller.  It is capable of tracking multiple colored markers and outputting their positional data in a variety of protocols.  What this means is Pixy does all the heavy lifting, and you can hook it up to your own microcontroller (like an Arduino) and just read the data feed.


With that in mind, here is a conceptual setup.

The tracker:
  • Pixy optical tracking camera
  • Head mount (strapping rig from a hardhat or something similar)
  • Articulated arm for camera.  I'd use an Israeli arm.
  • Optional LED to keep performer's face evenly lit as they move around.
The brain:
The markers:
  • For durable markers I'd try something like brightly pigmented PAX paint.  The camera can be trained to recognize and label different colors.  Each color would represent a specific point of articulation on the face.

Now lets look at the pros and cons for this theoretical implementation.  These pros and cons are not based on comparison with other implementations, but just looking at potential positive and negative aspects of this implementation on its own.


Pros:
  • Inexpensive.  The electronics would be around $100
  • Scaleable.  Adding new markers, or multiple performers would be trivial.
  • No moving parts.  Each moving part in a system introduces a potential point of failure.
  • Short setup time.  Just strap on a helmet and paint a few colored dots on the face.
  • No bulky computer with slow booting operating system and demanding power requirements.

Cons:
  • Might not work for all applications.  For example, I believe David is planning on tracking a performer's face while they are inside a character suit.  That would not leave much room for a camera, and you might not want to light up the inside of the character's head.  Depending on character design it might be possible, but has spacial requirements.
  • Potential for data noise.  You might need to add in some positional smoothing algorithms to account for possible jitter.
  • Potential for latency.  Each link in the data chain adds potential for latency.  (Camera > Micro Controller > Processing and potential smoothing algorithms, Servo Controller,  Servo)  This could be mitigated with the use lower latency digital servos and a well optimized data pipeline.
  • Accuracy with minute movements.  Much like predicting latency, without building this system it's difficult to determine what the fidelity would be like.
As with any system it would need to be celebrated to each individual performer, so their unique range of movements could be mapped to the movements of the character.


For now this is just a fun thought exercise inspired by David's research with Electromyography.


If anyone else out there has another way they might tackle this challenge please post your thoughts!


/Chris



Post edited by Chris Ellerby on

Comments

  • David BoccabellaDavid Boccabella Moderator
    edited March 2015
    Really nice unit. I'll have to keep my eye on that (adds to shopping list).
    A slightly cheaper alternative if you want to just track 4 points is the Pixart IR camera in a WII Remote.
    This little camera can track up to 4 IR blobs and return back the XY coords of each via I2C.

    I have also played a little with SimpleCV libraries on a RPi and the RPi camera. It's slow but one can always upgrade to faster processor subsystems if needed. The SimpleCV libraries are great because you can take a picture, strip out or looks for colors, user the HAARs features to look for particular facial points.
    Here is one pic of me testing a camera that looks directly at my mouth to pick up points on my lips.

    I used fluorescent markets with UV leds for illuminating


    Although the concept worked it was still too slow (and unwieldy) for my needs. so I discarded it. However it might work for some people



    f2.jpg 43.6K
    f1.jpg 31.3K
  • David BoccabellaDavid Boccabella Moderator
    edited March 2015
    Waldo's are great things one can use to translate a movement. If you make a waldo shield for an arm you can use it to accurately control a animatronic.
    Here are some examples of a facial Waldo (not mine)


    There is a movie of it's use here
    http://www.marcwolf.org/gallery/?moid=1787


  • David BoccabellaDavid Boccabella Moderator
    edited March 2015
    Currently I am looking at getting to the heart of the matter and using EMG to pick up the electrical signals that drive the facial muscles.
    This article show a lot of promise
    http://ruedelametallurgie.free.fr/utils/pdf/Gibert2009.pdf




    Of course - the accurate placement of the electrodes is the key issue. However :) as we are on a special effects site that has lot of tutorials on making a face mold and then sculpting silicone prosthetic on to it - it's a very small step to placing the electrodes on the dummy head and then building a low profile facial appliance around it. So to attach say 10 electrodes all you would need to do is to stick the appliance on with telesis 5.. making sure that you leave the electrode contact  clear.  An application of anti sweat solution might also help.
    :)
    Post edited by David Boccabella on
  • Another way (and I hope I'm not getting to technical here)  is that I picked up a batch of  AD620's on eBay really cheap.
    http://www.ebay.com.au/itm/131212471545?_trksid=p2057872.m2749.l2649&ssPageName=STRK:MEBIDX:IT
    These make great pre-amps for the EMG signals. I can multiplex their outputs together using a DG409 multiplexer to get one signal out, and then patch it into a
    http://www.advancertechnologies.com/p/muscle-sensor-v3.html
    The DG409 is a low resistance multiplexer that handles positive and negative signals.
    Controlling this by an Arduino I can cycle through the various channels and read the outputs and then have the Arduino sent the results to another processor in the form of a string of numbers.
    As each number will be the signal from a facial muscle one can use that to move a corresponding servo . You don't even need to use a 1 to 1 muscle servo. For example frowning can raise of lower animatronic ears.
    The above configuration can give 16 channels for about $200.

    If you have more money then the www.openbci.com is a fantastic concept as you can get upto 16 channels (main board plus daughter board). I'd definately recomments this as well (except that I cannot afford it - *looks pleadingly and rattles alms for the poor tin*)

    Take Care
    Marc
  • The wii blob tracking IR camera is great, I've used that in the past for applications like whiteboards.  

    The beauty of the Pixy camera is you can identify individual points based on their color, so the system can label and track them uniquely.  It also outputs a data stream that requires no additional processing.

    You can also setup markers that are just 2 colors side-by-side.  With a 2 color marker you are able to also get rotation and distance in addition to X/Y.  While that might not be as useful for facial tracking, it's a pretty nice feature.

    They cover that in a bit more detail in their video:
    https://www.youtube.com/watch?v=J8sl3nMlYxM

    I'm really interested in the EMG signals.  How much signal noise do you get, and how minute (and accurate/reliable) are the movements you can detect?  I really want to get some sensors and play with that technology now.

    /Chris
  • Hi Chris.
    With the one channel unit I experimented with from Advancer I got some very good readings re sensitivity. As the basic Advancer Technologies V3 board is about $50 with electrodes it's an excellent testbed for seeing what can be done.

    The big issue comes in when one is trying to expand this and cost is the factor. There is the OpenBCI board - very nice and small but that is about $500 and gives you 8 channels. The ADS1299 is a brilliant chip and is designed for EKG/EMG/EEG work.

    The good thing about EMG is that the signals are fairly strong( or Bright) - unlike EEG. So even small muscle movements can be picked up, however the size of the electrode will always be part of the issue. The OpenBCI project sell some very nice gold plated electrodes that are small (about 4mm), but I have found where I can get disposable EMG cups cheaply.

    Google Facial EMG with Google and then look at the images. That will give you a lot of information re the density of muscle observation.
    I'll put some links here to give an indications of accuracy.
    http://www.biopac.com/h27-facial-emg
    http://www.biopac.com/researchApplications.asp?Aid=27&AF=131&Level=3
    http://www.mdpi.com/2218-6581/3/3/289/htm
    Remember - eyes and blinking are also EMG :)

    If Cost is a real issue (like myself) one can consider multiplexing the signals. I am still waiting for my shipment of AD620's to arrive so I can push further here.

    My idea is this.. Using the AD620 as a preamp to one of the Advancer boards and a DG408 Multiplexer one can effectively feed many channels into one. This is the way they do it in larger systems costing $1000+ .
    The DG series are very important. Firstly they allow +/- swings, and have a very low switch resistance (100ohm) so that their impact on the signal is minimal. They can also handle almost video bandwidth which is well above what we are working with (below 50Hz)
    The choice of an AD620 (or similar differential amp) is likewise important. Yes there will be a LOT of noise but the key here is the DIFFERENTIAL amp part. It measures only the difference between the +/- leads.. Not the overall input.
    My only concern is with crosstalk, and signal settling re moving from a channel with a higher potential to one with a lower one.. How long will it take for the various components to settle to give an accurate reading.

    The DG series is 8 channels so you will need 3 digital lines to switch between channels. You can stack them  to get 16 channels and use the enable line as an extra address line.  If (like with the RfDuino) you have limited number of lines to work with you can use a 4024 (or 74HC equiv) which is a binary counter. so you just pulse in a clock signal and the chip will count through your address lines.

    Years ago when doing memory expansions for early computers (TRS-80) one would stack chips on top of each other - lifting the pins on the chips that were not common to all of them. It should be possible to stack the AD620's into 2 lots of 8 (for a 16 channel system), with a RfDuino,  2 x DG408, 4024, and a 14v Lipo battery and have a very compact unit that can be carried easily.

    Using the idea of a silicone prothesis to aid with the accurate positioning of the electrodes and your set.

    With the 14v Lipo..  Great things about Lipo's is that they have a Charge plug.. and you can access the individual cells there.  So a 14v Lipo is a  +/- 7v batery pack, with a 3v feed for the RfDuino.

    Arghh.. I've given all my secrets away here..  :)

    Enjoy all and I hope others will show me better.
    Dave



  • David BoccabellaDavid Boccabella Moderator
    edited August 2016
    Hi..
    An update recarding the Pixy.  Version 5 is out and is just as impressive and the previous versions.
    Only downside from my viewpoint is the size as the camera is soldered directly to the board.
    I posted into the CMU forum re possibly a version with a ribbon cable a.k.a. RPi Camera so that the camera can be mounted seperately.

    They say they are working on a solution :)

    http://www.cmucam.org/boards/8/topics/7432?r=7434#message-7434


    Hope this give some people some hope :)

  • As far i see i'm with the same facial tracking in mind...  i have 3d background and my idea it's to take the tcp stream from this program... faceshift.(http://www.faceshift.com/) it uses kinect for facial tracking and translates it to a stream of animation of blendshapes. i think the challenge is to translate it on the fly to the servos "collection of poses".  hope give new ideas to think about.
  • Hi Folks.
    Starting to experiment with my OpenBCI 8 bit board.
    Had a friend help wire me up for single electrodes.. I will try dual electrodes in a few days. Dual electrodes might be better as they are able to mask external signals more.

    Wired for EMG.. Not Sound.


    The OpenBCI GUI. The board sends 8 sets of numbers - one for each electrode. I can process the raw data and set max and mins  to work out the servo movements.



    The OpenBCI board. Very small and wearable.

    I am already seeing when muscles flex, hold, and relax and so it be easy to handle eyebrows,  frowns, and winks.

    Now working on lip lifts, smiles, and talking.  Look promising so far.

    Take Care
    Dave
  • Very cool!  I'm curious to see how much resolution you can get on the data for all the facial regions you track.  

    /Chris
  • Hi Chris
    What I can see is the initial burst of activity when the muscle is activated, which then settles down to about 20% of initial signal when the muscle is holding, and dropping back to base line when the muscle is relaxed.

    Once I have a way to secure the electrode properly I will beable to refine my data :)

    Take Care
    Dave
Sign In or Register to comment.