Vision in the loop: from seeing to interpreting


Cameras are routinely used to control all kinds of systems. This happens for instance in production lines, where all parameters are clearly fixed. But cameras are being put to a much wider range of uses, as is eivdent from the increasing number of camera-based apps on smartphones and tablets. But interpreting the resulting images can still be somewhat of a challenge.

Article from Objective 20, 2013

Measuring and controlling systems on the basis of video images is also called “vision in the loop”. The camera is effectively used as a sensor which views and measures a position, a color or a movement. If the measured entity is found to be anomalous the system is adjusted: this feedback is called the “loop”. This method of measuring and controlling is used in situations where the process under observation must not or cannot be disturbed, such as with conveyor belts or production lines. These systems are specific applications that operate with clearly defined parameters: if the color or shape of a product diverges from the requirements, production is adjusted. The challenge is to apply vision in the loop to less predictable situations. For instance to let a car drive itself: a dream scenario that quite a number of parties are in fact currently working to realize.

Human in the loop

For the time being human beings will still be part of the measuring and controlling loop of vehicles. Human beings are capable of acting correctly even in complex situations or in unknown environments. Thus we are still able to understand foreign road signs even if they are different from the ones we are accustomed to, whereas a computer would produce an error message in the same situation. The simple warning that we must drive on the left in the United Kingdom suffices for us to apply the entire scale of inverted traffic rules properly, whereas a vision in the loop system would need to be specifically programmed to acquire each separate bit of information in order to do the same. Interpretation and deduction are skills that aren’t easily implemented in a computer. Image processing, such as sharpening contrast or separating colors, presents little challenge for computers, but subsequently interpreting the images produced is a different matter. That is the biggest challenge at the moment. What does the image actually depict? A human being, a dummy or a portrait?


If systems are to be able to deal with unpredictable situations, they will have to acquire knowledge of their environment and do more than just repeat a trick they have been programmed to do. What level of knowledge and intelligence is required to achieve this? Can systems like this already be used in certain fields? And what situations still pose important challenges? In order to answer these questions, Technolution devised and carried out a research project. Experiencing is learning: many of the qualities and challenges of a new technology only become evident when it is used in practice. The question to be answered was this: is it possible to fly a model quadcopter automatically using three software-operated cameras? The answer is “yes”, as will be evident from the case description below. It was a fun experiment, even if it didn’t result in a system that can be very easily applied to other situations. Camera effectiveness is impeded by the environment, by color differences, shadow or light intensity. But once you have identified these impediments you can develop security devices to deal with them. One such device for instance could be to keep the quadcopter hovering in a fixed position if the camera image doesn’t provide enough information to proceed.

Robot football as a playing field

The quadcopter experiment isn’t fundamental research, but amounts to applying the results of research on algorithms and image interpretation that has been conducted at universities. Universities however aren’t just interested in fundamental research; they are also keen to acquire experiential knowledge. In fact they’re using robot football as a literal playing field for their research. The football pitch, the lines and the goal each have their own color, and color segmentation makes it easy to map the playing field and to act upon it. This is also the way we went about trying to fly our quadcopter: by looking for colors that we know are there. That’s true for all currently operating vision-in-the-loop systems: they’re all trying to use as much prior knowledge as possible when observing the world. Google is putting this principle to very literal use with Google Glass, a pair of glasses with a built-in camera that observes the surrounding world and interprets the images. And of course Google has the advantage that it already has an impressive amount of prior knowledge.


Most of us in 2013 already carry this kind of technology with us on a daily basis. Modern smartphones and cameras are fitted with vision-in-the-loop technology, such as face perception (e.g. the device takes a picture when it sees someone smiling), and image stabilization. They’re typical gadgets; nice to have, but there’s no harm done if on occasion they don’t work properly. But they’re not suitable yet for situations where reliability is crucial. For instance if a judgment has to be made about real-life situations: should a rush-hour lane be opened to traffic or not? Vision in the loop is not 100% reliable in these circumstances and leaves the final decision to human judgment. Fully reliable systems are still some way off. The last few percentages of reliability are very difficult to realize and are therefore very expensive.

Not 100% perfect, but still reliable

Vision in the loop is also used to automate jobs that people no longer can or want to do. The system can bring considerable added value to these situations even though it isn’t perfect. To name just one example: there are too many security cameras in our large cities and on our railways for staff to be able to monitor them all. By consequence the images are first screened by a computer, which only passes on anomalous images for human judgment. Human operators therefore only have to view a limited set of images. The system is not required to offer 100% reliability and an occasional false alarm is no problem. The workload has still been considerably reduced. Of course the system can’t afford to overlook any real incidents: it should err on the side of caution, presenting too many images for human judgment rather than too few.

This is also true for the automated system that is increasingly being used in parking garages. The barrier at the exit is raised as soon as a vehicle approaches, and this even before the driver has introduced his parking ticket into the ticket reader. This is possible because the system captured the vehicle registration number upon entry and linked it with the corresponding ticket. When the vehicle attempts to exit, the system reads the registration number again and checks if the matching ticket has been paid. Even if the system fails occasionally, the barrier can still be opened by introducing the ticket into the ticket reader.

Dependent on application

Even at the current state of knowledge and technology vision in the loop can be used profitably in many situations. But it is not yet a ready-made product, with the exception of the conveyor belt application. Robust vision-in-the-loop solutions for any given application will have to be tailor-made: they involve adapting current technology to particular applications. This requires more than software and electronics. It demands thorough knowledge of the domain and of all possible interferences, as well as the ability to apply technical and scientific knowledge in a practical way.

Case: the quadcopter that flies itself

Is it possible to fly a model quadcopter automatically with the help of three cameras and some software? This was the question we wanted to answer through an experiment we did recently with vision in the loop. The basic material – three cameras and the mini quadcopter – can be bought very easily in retail stores. The quadcopter has a number of distinguishing marks that cameras can trace without too much difficulty: a series of different color fields. The cameras can pick them up very quickly because they’re bright colors that contrast clearly with the surrounding environment.

The images are sent to a computer, which cuts them into different sections according to color. The computer searches for the color fields in the disassembled images. Each side of the quadcopter bears two distinguishing markers, each placed at a known distance from the other. The distance between the two markers is measured in each camera image and serves as a measure of the distance between the camera and the quadcopter. This distance can be calculated by calibrating the system prior to the experiment. Calibration is very simple: all that is needed is to wave the quadcopter in front of the cameras.

The  quadcopter’s position can be calculated through triangulation by following it with three cameras from different angles. In order to determine the spatial orientation of the quadcopter, each side has been given a unique marker. This makes it possible to ascertain for each image which side of the quadcopter is in view. This knowledge is necessary to control the quadcopter, a process that is facilitated by software that sends commands to the aircraft. This again requires knowledge of control engineering: you have to know what commands cause the object to move from A to B calmly and in a controlled fashion – that is: proportional control. The software can give the command to fly through the room in circles, or to remain stationary in one particular place. If the aircraft is pushed from its position, it will automatically revert to its starting point.

Related items

Image processing platform

Read more


Technolution stimulates smart industry

Read more


The world as a sensor - Coupler of ‘big data’

Read more


Partnership with Microsemi Corporation to promote RISC-V

Read more