This benchmark aims at developing a computer program that controls a pan-tilt camera to track and follow a target object in a cluttered environment. The programming language is Python, the robot model on which the camera is mounted is a Sony Aibo ERS-7 robot, and the target object is a yellow rubber duck.

The robot has a Display device shown in the robot window, which can be open by clicking on the robot with the right mouse button and selecting the Robot window item. This device displays the camera image as well as drawings resulting from the tracking procedure.

frames = 0

hits = 0

hit rate = 0%

The benchmark lasts at most 2 minutes and 20 seconds. The performance is measured as the hit rate, i.e. the percentage of frames in which the target object is recorded at the center of the camera:

hit rate = hits frames

The target object detection is checked each 128 ms based on the camera orientation and using this formula:

v1= norm( To Tc ) v2= norm( Rc c ) hit = | v1x v2x |<ε | v1y v2y |<ε | v1z v2z |<ε

where To is the target objects's global position, Tc is the camera's global position, Rc is the camera's global rotation matrix, c=[0,0,-1] is the camera's recording axis, and ε is 0.1.

How to improve the hit rate?

The benchmark goal consists of two separate tasks:

  1. Detect target object in the camera image.
  2. Move the pan and tilt camera motors to center the target object in the image.

Improve object detection

So the first improvement would be to develop a better visual tracking algorithm. The provided sample controller creates a mask for yellow pixels, uses OpenCV image processing to extract the blobs from the mask, and finally select the largest blob. These three steps are not optimized and some improvements are possible, for example:

  • Fine tune the condition to detect yellow pixels.
  • Apply morphological filter (dilation and erosion) to the mask to remove noise.
  • Use information gathered in the previous frames to select the most promising blob.
Other strategies could also be applied to the image in order to detect the target object, for example:
  • Use different color spaces to extract the regions of interest, like HSL.
  • Use constraints on the shape of the blob.
  • Use filtering algorithms (moments, Kalman filter, Particle filter) to predict the position of the object in the next frame.

Improve object following

Once the target object position in the camera image is detected, you have to move the camera motors so that the object remains at the center of the image. The sample controller uses the following functions to move the camera


  panHeadMotor.setVelocity(-1.5 * dx / width)
  tiltHeadMotor.setVelocity(-1.5 * dy / height))
          

where width and height are the camera width and camera height, dx and dy are the distance in pixels between the detected object center and the camera center.
The speed factor values 1.5 are not optimal. Tuning these factors or using a more precise method to move the motors could also improve the hit rate.