Big data, high-speed data: data streams are increasing in size and speed. The tendency to store and process everything in the cloud is strong. Storage has never been this cheap and bandwidth and processor power seem infinite. There are nonetheless proper reasons to use local devices to handle certain matters as opposed to the cloud.
Article from Bits&Chips #8, December 2017
We all have our heads in the cloud. We are counting on the fact that there will always be enough capacity to store anything we want. After all, there is always room for more data in the cloud. The same applies for bandwidth and processor power: data transport and data processing are becoming quicker each year. For new applications, the answer to the question 'will we be using the cloud for central management?' is almost an indisputable yes.
However, here at Technolution, we find that the other side is often not as clearly defined: what about the production of data? Thanks to the cloud, systems are more distributed than ever before. The growth of the IoT is astounding. All ‘things’ generate their own data stream: street cameras, mobile devices, smart thermostats and energy meters, traffic sensors, navigational systems, the list goes on and on and does not appear to be finished anytime soon. And most 'things' are only humble producers of data. However, in some situations the production of data increases faster than the storage and transport capacity. This is when we hit the limitations of the cloud.
As a technology integrator we are specialized in innovative solutions with software and electronics. Most of our clients have an obvious preference for central processing and storage of their data - including all raw data. Quite understandable since a central application offers the most control and is easy to maintain and update. Storing raw data also appears to have its benefits. After all, you never know when you might need the data for new applications.
During our meetings with the clients, we try to get some insight into their requirements. Will central management provide enough performance? How much data will be transferred to and from the cloud? Can the devices at the edges of the system be easily replaced? Are there situations that require local processing? Is it even possible to arrange raw data in such a way that they can be used at a later point in time? How much development time should be taken into account? During these meetings, we often experience a shift in point of view towards more local, distributed intelligence in the system.
Sometimes central storage and processing is required. For example, this might be the case when neural networks have to be trained. Such a training works best when the network receives as many raw data as possible. One of our clients uses pre-emptive analytics to predict when an electric motor will require maintenance. A sensor in a control cabinet nearby measures the power that is running through the motor with high frequency measurements. This results in large amounts of raw data that are recorded in a central back office. With the support of deep learning on these data, the software is being trained to recognize abnormal patterns and thus predict when an electric motor is at risk of encountering problems. If this occurs, the end user will receive a signal that the motor is in need of attention.
Nonetheless, it remains to be seen whether a centralized approach is still the best choice when the neural networks have been properly trained. The enormous cloud database with raw data has then become obsolete and the software can perhaps run better on a local system, closer to the electric motor. This greatly saves storage space and bandwidth, without making concessions regarding functionality.
Other situations more or less command local processing. Sometimes the amount of data is so large that it is no longer realistic to transport it real-time or store it. This is not a recent problem. In the sixties, Cern's particle accelerator in Geneva already generated an astronomical amount of measuring data during each experiment that far exceeded the available storage capacity. Even today, the institute can only store a fraction of the produced data.
We do not have to look far for any modern-day examples. The surface of the earth is constantly being observed by satellites. There is a constant flow of high resolution images and measuring data to receivers on the surface. Real-time storage of raw data is often not feasible. Smart algorithms determine immediately after receipt which information needs to be saved and which can be discarded. On the other end of the imaging spectrum, we find electron microscopes that can generate dozens of gigabytes per seconds for days in order to assemble a volume in 3D, with petabytes of data as a result. For these applications there is no a practically feasible technical solution to store all the data. Local intelligence is required to reduce the amount of data.
Local intelligence is also crucial when a high reaction speed is required. For example: when the visual sensors of a self-driving car detect an obstacle in the road, there is no time to communicate with the cloud. The car has to analyze the data immediately: is it a newspaper, a ball or a child on the road? Does the car need to swerve or make an emergency stop? It will have to make this decision autonomously.
Decentralized intelligence is, in some cases, an architectural decision. For a high-tech client, we are designing a system that directs the movements of a wafer during chip production. The slice needs to move uniformly with a precision of nanometers. This requires constant and direct adjusting, like a self-driving car. The system continuously monitors the movement and the position of the wafer with a large number of sensors that are performing measurements with a frequency of several kilohertz. The analogue measuring data are immediately digitalized, calculated and translated into instructions for the motor that controls the movement of the wafer. In this control loop, the largest part of the intelligence is located near the sensors. The decentralized elements translate the analogue measurements to position information, which is transferred to the central system via Ethernet interfaces. This way, they can represent their data in a uniform manner. The central application can limit itself to relatively simple instructions for controlling the wafer movements. Most of the controls and corrections are, after all, dealt with in a decentralized manner.
It is therefore important to evaluate each situation properly before choosing between central or local processing of the data. The centrally controlled cloud application will most definitely not disappear altogether. It is often still the best option, for example when data security has a high priority or when lots of experiments and flexibility are required to acquire the correct algorithm. It has, however, become apparent that cloud computing has its limitations.
Especially the speed limitations of data transport and the size of the raw data often cause the decision to fall onto implementing more intelligence at the edges of the system: the sensors and actuators. They already have chosen a suitable name – not really surprising considering it is the field of IT: edge computing. Edge computing is an clear trend that will reduce the pressure on the cloud considerably. Does this mean we will lose some of the raw data? Absolutely. Is this a problem? We do not think so. After all, it is the information included in the data that is vital, not the data itself.
Marc van Eert is an applied scientist at Technolution in Gouda. Anton Hoexum is a corporate writer.
Editing Nieke Roos
- Marc van Eert
- Applied Scientist
- Send email