When Alex bought XFace video cam, he was probably unaware that what unlocks his house door with no need to insert the key is a face recognition algorithm powered by artificial intelligence.
The main reason that led Alex to opt for a keyless system was to overcome the countless occasions he had been left out of his house because he had forgotten the key. Moreover, XFace was relatively cheap and came with many features that made it a best-buy product.
By purchasing it, Alex agreed on the terms and conditions the producer issued as a requirement to process the data collected by the camera. But because Alex was not a computer scientist or an expert in artificial intelligence, he was probably ignorant about how the software that powers the camera works. Code libraries and neural networks trained processing million pictures are complex topics that are neither at hand of a layperson nor accessible by reading the full terms. Alex ignored, for example, that some of the most common libraries to the power face recognition algorithm are exposed to gender and racial bias, as proved by MIT’s researcher Joy Buolamwini. Indeed, some of these libraries outperform when it comes to “lighter-skinned women, but an error at least 10 times more frequently when examining photos of dark-skinned women”.
This represents a risk whose consequences had not fully been taken into consideration by Alex when he left to IoT designers the “decision” on how to let him enter his house.
An explanation to figure out Alex’s decision is that many human activities rely on a different perception of risk. Indeed, the risk of an undesirable event can be measured in quantitative terms by its probability and its severity. The first concerns the chance that the undesirable event is to happen – in this case, being left out because of an algorithm bias -, while the second concerns the size and seriousness of the consequences – for example, Alex’s young daughter is in danger and he cannot enter. The combination of these two dimensions leads individuals to evaluate the acceptability of risk in relation to the expected consequences that may follow a decision.
However, there are some activities whose risk is difficult to be assessed. Among them, those regarding new technologies that collect and process a huge amount of data are particularly relevant. It is the case, for example, of the IoT whose ubiquitous nature raises major concerns for its massive collection of data.
The Eclipse 2018 IoT Dev survey revealed that “developers are starting to realize that beyond the “cool” factor of building connected devices, the real motivation and business opportunity for IoT is in collecting data and making sense out of it”. Seamlessly gathered IoT data fuel up machine learning-backed-decision-processes that are gaining momentum in people’s daily lives. This is true inasmuch these techniques assign them to certain clusters that determine, for example, how goods and services are supplied to their owners and customers.
Far from refraining developers to engage in new business opportunities, the General Data Protection Regulation (GDPR) requires those who are responsible for the collection and processing of data to address the risk bound to these activities. Starting on May 25th, the GDPR (which has unified the EU’s 28 members regime of data protection) has introduced under article 35 a procedure to assess this risk bound.
Our project is deeply concerned with the development of a risk assessment tool (PESIA) that would help developers and designers to assess the societal and ethical risks bound to certain design choices. For this reason, Virt-EU researchers from Politecnico di Torino have gone in deliverable 4.1 through an in-depth analysis of the risk assessment model foreseen by the GDPR, highlighting its strengths and its weaknesses. This analysis has been done in the willingness to develop a tool that is consistent with the GDPR’s Data Protection Impact Assessment, but at the same time strengthens the capacity to assess the ethical and societal risks bound to the processing of non-personal data.
The risk assessment rationale in EU before the GDPR
Touted as a revolution of the data protection discipline, the GDPR has been welcomed by worrisome headlines all around the web. Yet, beyond the noisy claims, what the GDPR basically does is to rethink the risk assessment rationale.
In doing so, the legislators revived the prominence of the accountability principle set forth by the Council of Europe in Convention 108. The rationale behind this principle was to hold the data controllers responsible by asking to put in place all the adequate measures to guarantee a lawful processing of data. In other words, the data controllers must in the first place be responsible for the assessment of risk coming with the processing.
On the opposite, the diffusion of computers registered in the eighties led legislators to pursue another orientation. The idea was to put on data subject’s shoulders the burden to self-assess the consequences of data processing. The diffusion of computer among the masses was thought to enhance greater individual awareness regarding the electronic processing of information. For this reason, through the Directive 95/46/EC, legislators sought to put the emphasis on individual decisions, transforming accountability in “terms and conditions”. This rationale was prompted by the consideration that to the extent data are important to shape personality and individual life, the best judge to run the process of informational self-determination was the individual herself. Thus the “notice and consent” mechanism as featured in the Directive 95/46/EC was inclined to an individual assessment of risk.
Today’s technological landscape may lead someone to label legislators’ choice as naïve. Yet they could have not foreseen that individuals’ informational self-determination would have been swallowed up into (or by) a data maelstrom. As a matter of fact, the advent of machine learning techniques and the proliferation of data sources have made it possible to unitize, swirl and cross-check data that are increasingly used to fuel automated decision-making processes. In such a realm, data subject’s right to informational self-determination is partially stricken off.
For this very reason, while preserving the individual consent mechanism, the GDPR has gone into the direction of a tougher accountability. As it is featured in the GDPR, the model of risk assessment draws upon an array of procedures and principles that go into the direction of a stricter assessment to be performed by those subjects concerned with the processing of data.
How should a data controller assess the risk of data processing within the GDPR framework?
What does it mean to be responsible and to thoroughly assess the risky activities bound to the collection and processing of data subject’s data in the aftermath of GDPR’s entry into force?
The example of XFace video cam might help to answer this question. Along with the face recognition algorithm, XFace video cam comes with other functionalities. It allows customers to get a message every time a car is parked in their property and contemporarily a video is live streamed to their phone. The camera installed inside the house is further featured with an inbuilt control system that checks if light bulbs are switched on, thus allowing customers to remotely switch them off.
Now XFace video cam has to comply with the GDPR, which imposes a rights-based approach to risk assessment. It doesn’t consider a risk in terms of a tradeoff between risks and benefits, but it preserves some fundamental rights: as a matter of fact, when it comes to data protection, every risk that could damage a data subject’s right should be necessarily avoided, no matter of what is lost in term of benefits.
For this very reason, the first consideration is that data controllers should be mindful about the processing of personal data. Art. 4 recalls the definition of personal data as “any information relating to an identified or identifiable natural person…“.
Thus, the controllers should primarily focus their attention towards those data that, if processed, might cause “material and non-material damages” that prejudice the “rights and freedom of natural persons” (Recital no. 75, GDPR).
Data collected through XFace recognition algorithm surely belong to this category, as well as location data, online identifiers, identification numbers, that can all be used to indirectly reveal someone’s identity. Once ascertained the type of data at the hearth of the processing operations, the data controllers have to deal with the “purpose limitation” and “data minimization” principles (art. 5). Before starting any assessment of the risk, the controller should limit the processing to those data that are necessary to deliver the service for which the data have been collected. In the case of XFace, for example, the data collection regarding light bulbs has to be limited to the feature of remotely switching the light off. Extending the collection of data to infer energy consumption patterns may go beyond the consent for processing which data subjects have agreed upon. Meeting these principles constitutes a precondition to start the risk assessment.
The risk assessment model in the GDPR
The risk assessment model is mainly enshrined in art 24, 32, 35 and 36 of the GDPR.
The general requirements expressed through the purpose limitation and the data minimization principles are devised to help XFace ascertain whether the processing of personal data comes with some risk. It is then complemented by a set of measures listed in art. 32, that aim to implement by default all the technical and organizational solutions to minimize the impact of data use on individual rights and freedom (e.g. pseudonymization, anonymization, limits to data retention).
The assessment procedure is called in the wording of art 35 “Data protection impact assessment“. The procedure entails for data controllers such as XFace to perform an impact assessment based on different modules. Its modularity is outlined in art. 35.7. and ideally foresees the following steps:
i) to make a recognition of the processing operations and of the purposes for processing;
ii) to assess the necessity and proportionality of the processing operations in relation to the purposes;
iii) to assess how the risks might harm the right of the data subjects;
iv) to select and implement those measures to prevent or mitigate the risks.
This scalable model sets a threshold in the notion of “high risk” to the rights and freedom of natural persons, yet not providing a clear definition of high risk. Instead, in art. 35.3 are specified three cases in which the DPIA is required. What emerges from these observations is the non-mandatory nature of the DPIA, though the national Data protection authorities can adopt either a list of cases in which DPIA should be performed (art. 35.4) and a list of cases in which a DPIA is not required (art. 35.5).
Even though all the steps foreseen in the assessment procedure have been carried out, if a high risk still persists, data controllers can ask for help from the national data protection authority for a prior consultation (art. 36).
The limits of the assessment model
We have seen how the model of risk assessment as featured in the GDPR has been thought to evaluate the risk linked to the processing of personal data. But how does it perform with those data that are not personal, yet can be used to take decisions that discriminate against the social group Alex belongs to, thus indirectly affecting Alex as an individual?
In the age of big data, everyone should be mindful that more the services someone gets from a technology, more are the data collected for processing. Such an observation is heuristically useful to observe the kind of data XFace video cam may collect beyond those considered personal in nature.
For instance, every time a member of Alex’s family enters the house, the face recognition software registers a timestamp value which can be used to reconstruct the habits of the family: who works, who stays at home most of the time, and based on the number of family members it can infer how many Alex’s children are.
These data, enriched and cross-checked with other data sources (e.g. census data), can be used for predictive policing that may lead to discriminatory practices. For example, let’s imagine that Alex is living in the outskirt of a big city. Maybe he moved with his family because, for the same rent he used to pay in the center, now he can afford a bigger house. His wife works from home as a freelance journalist. The family is complemented by his young son and his older daughter frequenting the high school. XFace recognizes that the family is composed of 4 members and that while three of them every morning get out from the house, a member spends most of her time inside as commonly does a housewife. If these data were cross-checked with those telling that the zone where Alex’s family lives in a low-income area and that his wife is probably a housewife, a credit scoring system might infer that the family is not eligible for a loan because it relies on a single salary.
It should be noted that these forms of discrimination are not necessarily against the law, especially when they are not based on individual profiles and only indirectly affect individuals as part of a category, without their direct identification. Moreover, within the EU, such as data analysis focusing on clustered individuals may not represent a form of personal data processing, since the categorical analytics methodology does not necessarily make it possible to identify a person.
The PESIA model
The aim of the PESIA model is to develop an assessment tool that will help to pay greater attention to ethical and social implications of data use, as we have seen in the example of Alex. For this reason, we are working to develop an agile tool to be used on a voluntary basis. Furthermore, we wish to promote an open and participatory approach to risk assessment (DPIA is internal and not public).
The ambitious objective we defined by devising the development of such an assessment tool is, therefore, the consideration of those values that go beyond those protected under the GDRP (right to data protection, security, the integrity of data, etc.).
So, the model we have foreseen to overcome the lack of accountability to social and ethical values is featured with three different layers:
1) the common ethical values recognized by international charters of human rights and fundamental freedoms;
2) context-dependent nature of the values and social interests of given communities;
3) a more specific set of values figured out by IoT developers, concerning the specific data processing application.
The main aspects outlined in this article suggest that the existing Data Protection Impact Assessment should evolve into a broader and more complex Privacy, Ethical and Social Impact Assessment (PESIA). We are strongly committed to developing this tool in the willingness to foster an ethical attitude towards Europe’s policymakers, industry and developers’ communities.