Skip to main content

Inrix, Big Data & the fine art of anonymity

How do you protect personal privacy while still allowing data to be of use in intelligent transportation? Ahmed Darrat of Inrix offers some thoughts on finding that balance...
January 9, 2025 Read time: 7 mins
Data anonymisation Inrix cyber attack © Weerapat Kiatdumrong | Dreamstime.com
Anonymisation techniques can include removing location data or rotating device IDs (© Weerapat Kiatdumrong | Dreamstime.com)

In today's rapidly-evolving transportation landscape, probe data from aggregated mobile devices and vehicles has become a critical asset for rapidly scaling mobility insights and traffic safety solutions.

This data has unlocked cost-effective solutions for managing traffic, increasing roadway safety, and planning equitable, sustainable and livable communities. As the landscape evolves, data will continue to serve as the bedrock of innovation.

However, as regulators and data providers enact new data privacy policies, the intelligent transportation industry has moved past “peak GPS”. As an industry we must balance both protecting user privacy and maintaining data utility. It is essential for transportation professionals to focus on outputs of solutions – like accuracy and precision of metrics – to help deliver outcomes for the public, rather than inputs like penetration rates.

To focus on outputs and outcomes in the current data paradigm, we must understand how anonymisation, data quality and data governance impact the pipeline of information from connected devices (e.g. phones and vehicles) in the field to your computer screen.

 

Data anonymisation

Anonymisation and de-identification techniques are designed to protect personal privacy by obfuscating, reducing or eliminating information that could potentially identify an individual. This includes sensitive locations and any personal data such as unique device identifiers (like VIN, MAC address, etc), frequently-visited places or patterns of movement that can be linked to a specific person. Under-anonymising data could reveal personal information or provide the necessary insights to reverse-engineer and obtain sensitive information. On the other hand, over-anonymising can significantly reduce the data’s value for transportation solutions.

For companies seeking to protect not only their customers but also their brands, privacy must be a core value. By committing to protecting data as it is collected, companies can enhance handling practices to ensure privacy protection goes beyond individual data provider standards. These practices include device ID rotation, data scrubbing, start and end point obfuscation, and other advanced anonymisation techniques.

 

“For companies seeking to protect not only their customers but also their brands, privacy must be a core value”

 

Anonymisation is not new; for some companies, like Inrix, it has been an important practice since day one. Companies committed to data protection must continuously adapt methods to address evolving privacy requirements while ensuring data quality. As data sources evolve, so do methods of data care.

Transportation and mobility solutions rely on aggregated, anonymised data to provide valuable insights and improve safety. Key use cases include:

•    Optimising signal timing based on traffic flow trends
•    Assessing mobility impacts approaching workzones
•    Performing origin-destination studies 
•    Monitoring daily traffic conditions
•    Identifying measures to improve road safety for all road users 

For these applications and many others, individual user identification is neither necessary nor desirable. However, when selecting a vendor to provide the detailed information necessary to deliver these and other use cases, two primary factors should be considered:

•    Commonly-visited locations can often include homes, workplaces, schools, places of worship or grocery stores. 
•    Device IDs enable connecting multiple trips to the same driver or vehicle. 

When both are known, bad actors could potentially identify behaviour of individual vehicles.

Anonymisation techniques can include removing location data or rotating device IDs. Hiding, deleting or obfuscating either of these items increases personal privacy. There are currently at least four approaches to data anonymisation. Each method takes a different approach to protecting personal privacy while supporting transportation use cases.

The table below visualises each of the most commonly-used methods:

 

Data quality

When identifying changes to the data by vehicle OEMs, logistics providers and mobile device companies, it is difficult to discuss the impacts to the outputs of products without venturing into a discussion about inputs. Most often, customers have questions about penetration rates because this is a relatively simple calculation. However, we have found that once the penetration rate is above a certain minimum threshold, any additional increase has limited impact on the products.

The two main areas of focus when evaluating whether a new data source meets users’ needs:

•    Frequency of data How often do we receive information from the vehicle or device?
•    Reliability of the data How sure are we that the data we receive best represents the ground truth?

The frequency of data pings allows for the most advanced use cases. High-frequency data allows us to observe a vehicle’s movement often enough that AI engines can determine much more in-depth information when mixed with detailed information about the context of the location. For example, even at relatively low penetration rates, metrics calculated with high-frequency vehicle data can be incredibly accurate with a mean absolute percent error of percent arrivals on green (less than 10% error) when compared to hardware-based systems. Additionally, high-frequency data allows us to calculate turning movement-level data and overall control delay, something low-frequency mobile data solutions and hardware cannot provide.

Data from aggregated mobile devices and vehicles has become a critical asset for rapidly scaling mobility insights and traffic safety solutions (© Si Le | Dreamstime.com)

Data reliability is an especially necessary topic. As data providers restrict data usage based on privacy concerns, more data brokers have entered the market. Because many of these brokers have been paid on a ‘per point’ basis, they are incentivised to provide the largest number of data points possible. As a result, a fair number of these brokers have been plagued with fraudulent data, often replaying historic data. To counter this, solutions providers need to leverage other unplanned, real-time data sources such as incidents and closures to quickly flag and delist potential fraudulent data.

 

Data governance

When identifying the utility of data coming from a specific solution, it is essential that users understand the governance of that data from the point it was created until the insights and outputs are delivered. This can be done with qualitative information without revealing intellectual property or trade secrets. 
When the industry faced a data disruption, solutions providers had a choice: obtain additional data sources or outsource processing and insights to a third party. Many solutions providers pivoted business models from obtaining raw data, applying their intellectual property, and delivering their own insights to visualising third-party insights.

While this approach may help to quickly go to market with a wide array of features, it can be a significant problem for the solutions providers’ end users. Without the vertical integration of raw data processing, delivery of insights, and development of user interfaces, end users are locked into a product even if a specific feature is missing or the product doesn’t work for them. For example, a solutions provider that only built a visualisation tool doesn’t have direct control of its supplier’s roadmap and cannot be fully responsive to improvements or corrections.

To the contrary, Inrix brought on numerous additional data sources to fill gaps and delivered a wide range of new algorithms that leveraged existing providers to be used across more products. In the end, we maintained complete control over their products and roadmaps.

 

Navigating the path forward

In this evolving landscape, organisations need to carefully evaluate their data sources to ensure they align with their privacy and data utility goals. When assessing potential data providers, consider the following key questions and factors:

1.    What is the source of the data (i.e. passenger vehicle, freight, local delivery or mobile devices)? 
2.    How accurate and frequent is the GPS signal? 
3.    What processes are in place to filter out non-transportation-related signals and identify potential biases?
4.    How does the provider handle data fraud and ensure the integrity of its data?
5.    What is the provider's approach to privacy and anonymisation practices, and how do they balance data utility?
6.    How much control does the provider have over the processing of raw data to deliver and visualise insights?

The power of data lies in its ability to transform lives and solve complex problems. As privacy concerns take centre stage, companies that continue to innovate and develop privacy-preserving solutions that benefit our communities will drive the industry forward.

The end goal should be a connected device and probe data market that ensures the development of technologies that protect the privacy of individuals while enabling critical mobility and safety solutions.

ABOUT THE AUTHOR
Ahmed Darrat is Inrix chief product officer


 

For more information on companies in this article

Related Content

boombox1
boombox2