It’s not a new concept anymore, but big data is currently very much in the spotlight – and this also in the world of traffic and transportation. But what exactly is big data? What uses does it have currently? And what possibilities are there for the near future?
Article from NM-Magazine, June 2014 – available also in PDF (Dutch only)
Working with big data can be best described as working with very large data files deriving from several data sources. The rapid succession of data streams is also an important element. It can be summarized as volume, variety, velocity. How big data files must be to qualify as ‘big’ has not been defined precisely and differs per field. Gigabytes are seen as big in some areas, while in other domains it’s more about petabytes (1,000 terabytes). As long as processors continue to become bigger and memory capacity cheaper, the notion of ‘big’ will continue to evolve within each domain.
Big data in the mobility domain
Working with big data files is nothing new in our field. The traffic signaling system MTM was developed in the 1970s, for instance, and was widely rolled out beginning in 1988. Since then it has been processing large volumes of measurement data on a 24/7 basis. Nevertheless we have only recently begun to speak about ‘big data in traffic and transportation’. This isn’t only due to the hype that surrounds the concept. It’s a fact that new mining techniques and faster processors have created more volume, more variety and more velocity. The data we are collecting is also more easily available. To give just one example: the Dutch National Data Warehouse for Traffic Information (NDW) processes and collects some 216 million details per day – and all of this information is freely available to interested parties. Another factor is that the number of possible applications has grown. In the past, a lot of data was only interesting to traffic control center staff, but nowadays data streams find their way to (commercial) apps, apps geared to logistical companies, and road users much more easily.
The first big data applications…
What results have these first big data steps delivered for our field? The most important achievement so far is that our picture of the traffic has been broadened and sharpened. For a long time we only knew what was going on on the main road network, because that is where Rijkswaterstaat had invested in induction loops. But thanks to GPS and GSM data, we now also know what is happening on the thousands of kilometers of provincial and municipal roads – and this almost real-time. Something similar is true for what it is that we know. For instance, induction loops measure intensity, but don’t tell you anything about the origin-destination relations of the traffic that has been detected. It’s the opposite with floating car data: that doesn’t say anything about intensity, but it does allow you to reconstruct the origin-destination relations. Thus the two sources complement each other. Another example is that up until recently it was not possible to map the foreign segment of road users or visitors. Using GSM data now enables us to do this. This broader and sharper picture of the traffic that we have thanks to a bigger and more varied data stream, stands at the basis of almost every serious development that our field of study has seen in recent times. Take network management, for instance: would it be possible to intervene locally if you don’t know how traffic moves along the different road networks, where the bottlenecks are and where there is still space? Impossible. Data has also made our traffic models more precise and more reliable. We are currently able to make short-term predictions with a reasonable degree of precision, which makes it possible to take proactive measures. More reliable long-term predictions are a blessing for policymakers: they can now calculate the consequences of different accessibility policies in advance. Policy has become more transparent because of the available data. Previously, we had to rely on time-consuming and sometimes not very representative surveys, now we have a huge reservoir of measurement data which allows us to reconstruct how the traffic really responded to the approach that was chosen. Finally, there is the large number of passenger information services that is being set up – they too rely on data. It’s interesting to note that these applications are increasingly able to make excellent use of ‘small data’ sources: PRIS data for the occupancy of car parks, details from traffic light control systems about the traffic lights at an intersection that lies ahead, current planning information from road managers about road works etc.
… but they could be much bigger!
At the same time we have to acknowledge that the current applications are only the beginning. As has been illustrated in the figure on page 11 of the PDF, the big data universe is continuously expanding. At the moment we are more or less in the second layer, that of the minute details and sources such as loops and floating car data. As we remarked above, this has ensured that we know more about the traffic: our picture has become broader and sharper. But have we also obtained a deeper understanding of traffic? Or, more importantly: of the traveler? Not yet. But it is precisely insight into human behavior that is one of the most interesting promises of big – or in our case: bigger – data. The social behavior of human beings has always been difficult to ‘measure’. Sociologists were limited to methods such as surveys and interviews, in which a small, ‘representative’ group of people could state their preferences or in which they had to explain afterwards what it was again that they had done and why. But big data can drastically improve this. Looking at continuous data streams such as public transport chip cards, smartphones, navigation systems, CAN buses, connected vehicles and social media, means collecting an ever growing mass of data that offers insight into the actual behavior of human beings. In this way, sociology becomes social physics. Alex Pentland, professor at Massachusetts Institute of Technology and an important protagonist of working with big data, fittingly calls the method of extracting images of our social interactions from data ‘reality mining’.
What could this revolution mean for the world of traffic and transportation? Well, for instance, that we understand better what motivates people to travel, what their reasons are for choosing the car or public transport, what routes they prefer, when they are in a hurry and to what extent they adapt their driving behavior accordingly in terms of speed or overtaking. We will be able to learn how groups travel, which origin-destination relations translate into road travel and how this evolves over the year. We will discover how price developments and fluctuations of the economy affect travelers’ mobility behavior. The trajectories that people follow near large events can be mapped, as well as the interaction between groups of different coming from origins. Finally, we will be able to understand the ‘moving human being’!
We’re not quite there yet. But it is wise to treat these possibilities as the proverbial point on the horizon so that we can steer our endeavors in the right direction. What are we still waiting for? And what are the obstacles that we are going to meet on our way? Social media will be able to offer deeper insights, especially when it comes to interpreting mobility data. But our big data pool will be at its biggest once extended floating car data (xFCD) becomes available on a large scale. That will allow us to see how drivers behave on the road, what speeds they drive at and what distances to the following vehicle they keep, as well as how they brake and accelerate and the impact that this has on road safety and on the traffic flow. Until that time we’ll have to try and get the most out of the available sensors – see also the text box in the PDF about current data sources – while we use surveys and trial projects to prepare for the new data. There are already a lot of things going on at the moment in the field of social media interpretation, as is evident from the services provided by companies such as Greencorn.
… and obstacles
One obstacle we are sure to meet on the way to bigger and biggest data is the openness of the data involved. Many sources are locked in order to protect the privacy of the ‘data suppliers’, i.e. in our field the travelers. But there are also (fully legitimate) commercial issues at play: big data is worth money! The result is that the commercial collectors of data – and they are responsible for most of the new data sources – tend to hoard their databases. If they supply any data to third parties at all, the data involved usually consists of abstracted information, such as floating car data, and not raw data. However useful it is to abstract data, even for the more current applications, it does mean a lot of information about underlying patterns and structures is lost. Speed data and travel times on the basis of floating car data, for instance, are very useful for network management, but they don’t reveal anything about origin-destination relations or about individual driving behavior. It’s said sometimes that more data will inevitably become open as time passes, but this usually means abstracted, processed data. Freeing up raw data is a considerably more complex story, in which issues such as privacy and the commercial value of the data involved deserve serious attention. Another obstacle on the way to ‘bigger data’ more specifically concerns the data collected through roadside systems, such as data from induction loops, license plate cameras and Bluetooth measuring stations. This roadside approach does not conform to the principle of reciprocity that should be part of the data collection process: people permit their data to be collected in return for a service that is useful to them. This is the principle behind recognized big data collectors such as Waze, Google Maps, Facebook or Twitter. Building big databases by collecting information without people’s free consent will eventually cause resistance. A striking example is the public debate that has arisen about the dense network of license plate cameras used by rush hour avoidance schemes. This form of collecting data pushes against the limits of what society will accept.
How can this problem be tackled? At the moment only the large internet companies and service providers have properly enshrined the principle of reciprocity. For instance, they offer navigation as part of a wider vision on information provision, often linked to a smartphone app. This supply means they are best placed to collect an increasing amount of big data. But their only interest is in helping individuals – they don’t work to serve a collective ‘network interest’.
Road managers need better data to be able to increase the quality of their network management, and this means they do have a ‘network interest’. What tangible compensation can they offer travelers, so that these might be willing to part with data about their own behavior? In the long run, vehicle-infrastructure communication as part of cooperative systems might help. The compensation in that case consists of a heightened sense of safety and comfort.
A more fundamental issue is perhaps that road managers will have to learn to communicate the individual value that safeguarding network performance delivers: they have to learn to bind road users to network performance. Another option is to leave this challenge to market players; this is the strategy adopted by the action program Connecting Mobility and its Route map. In this case, they will at least have to learn to communicate their own regulations and control scenarios – and particularly the motivation behind these – to the service providers.
To sum up we could say that the phenomenon of big data is much more than a hype. The increasing stream of data has already fundamentally changed our field of work, especially because it has given us a much sharper and broader picture of the situation on the road. At the same time, we have to conclude that the real revolution has yet to take place. Our big data has to become much bigger, and in particular it has to include data about the behavior of individual road users. But we will have to overcome a few obstacles before we reach that point, such as the ‘opening up’ of raw data and creating reciprocity in roadside data collection. If we manage to deal with these issues, however, and it becomes possible to use the really big data sources, then the possibilities are huge. Really understanding travelers – this is what will revolutionize our field!
Marco Puts, researcher at CBS (Statistics Netherlands): “Big data is actually a strange term for this type of data. Instead of big data we should be speaking of ‘wild data’ or ‘unrefined data’. An important characteristic of this kind of information is that the proportion of ‘noise’ is so high, that it is necessary to filter out the information (the signal). Nate Silver’s bestseller is called ‘The Signal and the Noise’ for a reason. Researchers of big data are like gold seekers standing in riverbeds with huge sieves to separate tiny lumps of gold (the information) from the sand (the noise). The big challenge that we are facing therefore is to develop techniques that will allow us to separate the signal from the noise, so that we are able to meet information needs as best as we can.”
Frits Brouwer, director of NDW: “You can only speak of if several data sources, both professional and non-professional (for instance from social media), are used for a policy goal that is wider than what was originally possible. We’re currently in the process of deciding whether we should also include meteorological information in the NDW’s historical database. It can be interesting for traffic engineers to know whether the road surface was dry or wet at the time a traffic jam developed or an accident occurred. How long will it be before cars will be telling us whether their windscreen wipers are on, and we store this information at NDW? You can really only start to speak of big data once you also begin to filter Twitter details about what people on the spot are communicating about the cause and therefore the duration of the traffic jam.”
Hans van Lint, professor of Traffic Simulation at Delft University of Technology: “I see incredibly exciting possible applications for big data, as long as we combine it and fuse it with the data we already have and the knowledge that already exists. That is important not only for my PhD students, but for anyone who uses models to make predictions about traffic and transportation. Knowledge of traffic and transportation begins and ends with data. Making sense of big data begins and ends with knowledge of traffic and transportation.”
Dr Peter van der Mede is Big Data Consultant and Business Developer at DAT.Mobility, Goudappel.
Peter Verwaaijen is Director of Information Technology and Mobility at Vialis.
Philip Tailleu, MBA, is Managing Director of FLOW nv.