This benchmark aims at developing a computer program that controls a pan-tilt camera to track and follow a target object in a cluttered environment. The programming language is Python, the robot model on which the camera is mounted is a Sony Aibo ERS-7 robot, and the target object is a yellow rubber duck.
The robot has a Display device shown in the robot window, which can be open by clicking on the robot with the right mouse button and selecting the Robot window item. This device displays the camera image as well as drawings resulting from the tracking procedure.
The benchmark lasts at most 2 minutes and 20 seconds. The performance is measured as the hit rate, i.e. the percentage of frames in which the target object is recorded at the center of the camera:
The target object detection is checked each 128 ms based on the camera orientation and using this formula:
where is the target objects's global position, is the camera's global position, is the camera's global rotation matrix, is the camera's recording axis, and ε is 0.1.
The benchmark goal consists of two separate tasks:
So the first improvement would be to develop a better visual tracking algorithm. The provided sample controller creates a mask for yellow pixels, uses OpenCV image processing to extract the blobs from the mask, and finally select the largest blob. These three steps are not optimized and some improvements are possible, for example:
Once the target object position in the camera image is detected, you have to move the camera motors so that the object remains at the center of the image. The sample controller uses the following functions to move the camera
panHeadMotor.setVelocity(-1.5 * dx / width) tiltHeadMotor.setVelocity(-1.5 * dy / height))
where width and height are the camera width and camera height, dx and dy are the distance in pixels between the detected object
center and the camera center.
The speed factor values 1.5 are not optimal.
Tuning these factors or using a more precise method to move the motors could also improve the hit rate.