The Data Act and PPT

Which privacy preserving technologies can help share data safely in light of the new Data Act, and how do they do so?

On January 11, 2024, the European Union (EU) approved the European Data Act, aimed at facilitating the free flow of personal and non-personal data in Europe. It establishes harmonized rules for making data generated using connected products (specifically IoT devices) accessible to other parties, thereby enabling its reuse for new purposes. The act also clarifies who can derive value from data and under what conditions, fostering a fair distribution of data value.

With the implementation of this regulation, there will be an exponential increase in data circulation, bringing both benefits and potential problems. Among them, two issues stand out: the security of industrial data secrets and the security of individuals. Indeed, it will likely become easier to identify data subjects by combining new data with those already available. Given this new regulatory landscape, how and which privacy-preserving technologies, particularly differential privacy, can help to share data safely?

In this article:

What is the European Data Act, and what does it entail?
- What led to the Data Act?
- How does the Data Act work, and who does it apply to?
What does a new quantity of data entail?
- Advantages
- Threats
How Privacy Preserving Technologies Can Help Securely Disseminate Data
- What are PPTs, and how do they work?
- How can differential privacy help?
Conclusion

What is the European Data Act, and what does it entail?

What led to the Data Act?

The path leading to the adoption of the Data Act has its roots in the EU’s acknowledgment that over 80% of industrial data remains untapped due to the absence of a suitable framework regulating its access.¹ Consequently, it introduced the Data Act as a pivotal component of its data strategy, aiming to grant access to data generated by all interconnected devices.

With the implementation of this new regulation, a substantial surge in data, both personal and non-personal, is anticipated. This data can be analyzed to foster the development of novel products, create more efficient services, and enhance emergency management.² The Data Act serves as the industrial counterpart of the GDPR and seeks to safeguard Europe’s global competitive stance in the realm of artificial intelligence.

To grasp the significance of this regulation, particularly the massive data release it entails, it is crucial to consider the concept of open data. OpenMined, an organization established by a researcher from Google and DeepMind,³ elucidates in its manifesto that “over the past 20 years, some artificial intelligence algorithms have directly impacted over 12 billion hours of people’s time each day.”⁴ The development of artificial intelligence systems with the potential to enhance or automate global work, furnishing superintelligent capabilities, is underway. Consequently, data sharing envisaged by the Data Act is imperative for two reasons. Firstly, it fosters the continual creation of superior products and services. Secondly, it facilitates an understanding of the ramifications of this superintelligence on our lives. Presently, researchers lack sufficient access to artificial intelligence models or datasets, leaving us largely unaware of artificial intelligence’s comprehensive impact on the world. In essence, we lack the requisite data to comprehend the data itself. The Data Act is intended to radically change this situation.

How does the Data Act work, and who does it apply to?

The Data Act will become fully operational on September 12, 2025. It applies to both personal and non-personal data, but in the case of the former, the GDPR takes precedence. The focus of the regulation is on the functionalities of data collected by connected devices, distinguishing between product data and related service data from which readily available data can be derived.

The Data Act imposes a range of obligations, including⁵:

obligations for manufacturers to design their products so that data generated or captured by those products are available to users of the product for free and, ideally, directly.
obligations for service providers to allow their users to access, reuse, and share data collected through their products and related services free of charge.
rights to allow access to data by third parties upon the user’s request or for legal obligations, including readily available data and relevant metadata.
measures regulating contractual terms in data sharing contracts between parties, such as data holders and users or third parties;
standards to facilitate the transition between cloud service providers and other data processing services, eliminating pre-commercial, commercial, technical, and organizational barriers;
mechanisms for public bodies to access private sector data in case of public emergencies.

The new obligations may require organizations to consider that they will make previously proprietary data accessible to users and roll out new contracts that are Data Act compliant. They will also apply to a broad range of products generating non-personal data – for example, industrial and commercial machines sold business-to-business – which were previously largely unregulated under EU data laws but will now need to be reassessed.

What does a new quantity of available data entail?

Advantages

The scope of the law covers every object that can be connected to the internet, ranging from cars and trucks to wind turbines, industrial robots, dishwashers, coffee machines, smart speakers, and watches. This comprehensive coverage means that every sector in Europe will be impacted.

This regulation brings anticipated changes for many stakeholders. Notably, the International Road Transport Union (IRU), representing various mobility and logistics companies, including trucking, bus, and taxi companies, stands as a major supporter. For them, data generated from vehicle usage holds significant commercial relevance. Information such as fuel consumption, wear and tear, and driver behavior is crucial for driver training and safety measures.⁶

The regulation offers transport operators the opportunity to access data without the continual negotiation with vehicle manufacturers. This streamlining leads to several advantages, including the elimination of the need for installing separate devices for data collection. Additionally, industries like wind turbines, which produce substantial non-personal industrial data such as wind speed and direction, can leverage this regulation to provide supplementary services like remote management of blade positioning.⁷

Threats

The release of 80% of the data will inevitably entail risks, some more evident than others. Among the obvious risks is the concern that sharing could expose commercially sensitive data, stimulating the creation of imitations that directly compete with large European companies.

A significant example is provided by Siemens Healthcare division, known for manufacturing CT scanners and MRI machines. To generate images, these scanners must acquire raw data, such as X-ray data for CT scans and pulse data for MRI scans. Sharing such data could, in the wrong hands, facilitate the reverse engineering of crucial innovations. Given the current geopolitical tensions, the risk of exploitation is even more pronounced.⁸

The regulation appears to address this issue through the “trade secrets clause.” According to Articles 4 and 5, the data controller generated from the use of products or services may invoke trade secret protection as a defense against excessively broad data access requests. However, this represents only an exception to the rule that mandates data provision under the Data Act. In essence, the regulation seems to ensure the protection of trade secrets. Indeed, if this were not the case, it would compromise the very purpose of the regulation, which is to confer a competitive advantage to European companies.⁹

The less evident risk, however, is that of re-identifying the subjects to whom the data belongs or who are somehow linked to it (see also our article, “The Most Common Data Anonymization Techniques”). Even in the absence of issues related to industrial secrecy, the threat of re-identification persists. Returning to the example in the medical sector, raw data related to X-rays or heart rate can, once coupled with other already available data, allow for the identification of individuals. This is not just a speculation but already a reality.

In our article, “The Impact of Privacy Preserving Technology on Data Protection,” we have seen how in an experiment conducted by the Mayo Clinic – in which imaging and facial recognition technologies were used to link subjects in a study to their magnetic resonances – a facial recognition program correctly associated 70 (out of 84) subjects to anonymized images.¹⁰

In fact, even though scans can be anonymized by removing personal and sensitive data, neural networks can reconstruct the facial features of removed faces and de-anonymize the images.

True data anonymization is difficult to achieve because data can be combined with other public datasets and de-anonymized (a well-known example is when two students were able to re-identify some users from a large, anonymized Netflix database).¹¹ We have also seen how some researchers at the University of Oxford have shown that age and sex can also be deduced from a structural brain image or directly identified from heart rate.¹²

This is precisely the hidden risk brought by the Data Act. Releasing 80% of machine data, even if seemingly impersonal or anonymized, could cause re-identification in situations where it could not even be conceived that one piece of data would reveal another. Hence, it is inevitable that the data flow resulting from the regulation will exponentially increase this risk of re-identification. Therefore, without the appropriate care and technical measures, more data available does not automatically translate into greater benefits – quite the opposite.

How Privacy Preserving Technologies can Help Securely Disseminate Data

What are PPTs, and how do they work?

In our article, “The Impact of Privacy Preserving Technology on Data Protection,” we discuss in detail what privacy preserving technologies (PPT) are and how they work. Here, we will discuss how they can be of help in light of the new Data Act.

Keeping data safe is more important than ever today. PPT revolutionizes how we manage and protect information. These technologies aim to enable secure management, sharing, and processing of data, minimizing the risk of unauthorized access, disclosure, or compromise. In fact, the risk of compromise is the highest among the risks brought about by the Data Act.

We’ll take a quick look at three of them, with particular focus on the Data Act, before discussing differential privacy:

Federated learning is a machine learning approach that allows training models on decentralized devices while keeping data local. This occurs without centralizing raw data, thus reducing the need to transfer sensitive information to a central server. This technology has practical applications in healthcare where patients’ sensitive data and, soon, machine data are freely exchanged.
Secure multiparty computation (SMPC) is a cryptographic technique that allows multiple parties to jointly compute a function on their inputs while keeping those inputs private. Essentially, SMPC distributes computation among parties so that none of them can learn anything beyond the final output.
Homomorphic encryption is a cryptographic technique that allows computations to be performed on encrypted data without decrypting it. In other words, it enables keeping data encrypted while performing mathematical operations on it. With reference to the Data Act, it would make it possible to find data on people with arthritis from a connected devices such as wearables, run calculations on it, and create a useful model based on group-level insights without ever decrypting personal records. Homomorphic encryption is gaining popularity, and it is hoped that one day, almost all computation will be done on encrypted data.

PPTs have proven their effectiveness in various sectors. In the healthcare field, for example, homomorphic encryption enables secure collaboration between researchers and healthcare providers without compromising patient privacy. Financial institutions use them to analyze transactional data securely, while e-commerce platforms leverage them to enhance user personalization without exposing individual purchasing habits.

How can differential privacy help?

Differential privacy (DP) emerges as a solution to facilitate safe data sharing in light of the new Data Act. DP involves analyzing and sharing sensitive data while safeguarding individuals’ privacy by adding statistical noise to the data. DP unlocks the analysis of even sensitive data fields by ensuring no individual record can become compromised. Furthermore, unlike encryption methods, DP guarantees privacy at the output level, preventing sensitive data leaks from malicious or negligent queries, which can occur even if the underlying data is encrypted.

Within the framework of the Data Act, DP enables organizations to share data for research and analysis while preserving the confidentiality of individuals’ sensitive information.

The Data Act and other privacy regulations underscore the importance of protecting individuals’ privacy when handling data. In this context, leveraging privacy-preserving technologies, especially DP, aligns with the objectives of these regulations. For instance, President Biden’s Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence emphasizes the necessity of enhancing privacy-preserving research and technologies, including cryptographic tools, to uphold individuals’ privacy.¹³

Moreover, the evolving landscape of data privacy legislation, as elucidated in the article “The New Rules of Data Privacy,” highlights the growing significance of embracing privacy-preserving technologies to ensure adherence to regulatory requirements and foster consumer trust.¹⁴

In conclusion, privacy-preserving technologies, particularly DP, are pivotal in facilitating secure data sharing and ensuring compliance with data privacy regulations such as the Data Act. These technologies play a crucial role in striking a balance between the imperative for data analysis and research and the imperative to safeguard individuals’ privacy.

Conclusion

The Data Act stands as a comprehensive initiative within the European Union, tackling the challenges and opportunities posed by data. It aims to eliminate obstacles to data access for both private and public entities while preserving incentives for data investment. This act represents a key cornerstone of the EU’s regulatory framework, essential for maintaining competitiveness in the global landscape of artificial intelligence. The openness and sharing of data offer benefits for research endeavors and enhance our understanding of the emerging superintelligence.

However, fully leveraging the data flow from interconnected devices will necessitate striking a delicate equilibrium between information exchange and safeguarding individual privacy. Risks stemming from corporate competition and geopolitical dynamics, notably the threat of reverse engineering for product emulation, demand meticulous attention.

Yet, the most significant risk remains re-identification. While the act’s regulations provide a foundation for data sharing, the true differentiator will be the implementation of technical and organizational measures. PPTs emerge as potent solutions across various sectors, ensuring secure data management and fostering collaboration and innovation.

Given that non-personal data will be subject to the same regulatory framework as personal data, it becomes imperative to extend the same level of scrutiny to both realms. PPTs are poised for significant advancement, serving as the foremost tool in navigating the delicate balance between acquiring more knowledge and safeguarding privacy, even at the expense of knowledge acquisition.

¹ What does the new EU Data Act bring to companies, innovators and Europeans? https://multimedia.europarl.europa.eu/en/video/what-does-the-new-eu-data-act-bring-to-companies-innovators-and-europeans_N01_AFPS_231303_DATA
² See note 1.
³ www.Openmined.org
⁴Privacy Preserving Tech, Tools for data use.
https://blog.openmined.org/privacy-preserving-tech-tools-for-safe-data-use/
⁵ Roschier, The new EU Data Act enters into force in January 2024, 2 January 2024
https://www.roschier.com/newsroom/the-new-eu-data-act-enters-into-force-in-january-2024/?post_date=20240103091209
⁶ Pieter Haeck, “Europe’s new data law explained”, 22 June 2023, Politico
https://www.politico.eu/article/europe-new-data-act-explained/
⁷ See note 6.
⁸ See note 6.
⁹ Theresa Ehlen, “The Data Act and the EU’s Digital Agenda – striking a balance in trade secret protection?”, 14 November 2023
https://technologyquotient.freshfields.com/post/102isit/the-data-act-and-the-eus-digital-agenda-striking-a-balance-in-trade-secret-pro
¹⁰ The New York Times, You Got a Brain Scan at the Hospital. Someday a Computer May Use It to Identify You. https://www.nytimes.com/2019/10/23/health/brain-scans-personal-identity.html
¹¹ Kris Kuo, “You can be identified by your Netflix watching history”, 31 July 2020, Artificial Intelligence in Plain English, https://ai.plainenglish.io/ahh-the-computer-algorithm-still-can-find-you-even-there-is-no-personal-identifiable-information-6e077d17381f
¹² https://www.biorxiv.org/content/10.1101/2019.12.17.879346v1
¹³ https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/
¹⁴ https://hbr.org/2022/02/the-new-rules-of-data-privacy