1. Introduction, description of the issue to be addressed
The fact that technology is developing at an extremely accelerated pace is hardly questionable, the spread of artificial intelligence (AI) and its various types is unstoppable, and AI-based software is increasingly becoming part of our everyday lives in many areas.
In many cases, the use of AI involves the processing of personal data, which means that data protection issues cannot be avoided when designing, writing and using software based on the use of AI.
The aim of this thesis is to examine what the main data protection issues raised by the design and use of artificial intelligence systems involving the processing of personal data in the light of the European Union's General Data Protection Regulation (GDPR) are, whether the practice of the authorities in the field of artificial intelligence shows that the GDPR contains an adequate level of regulation, which does not require new rules from a data protection perspective, but only the proper application of existing rules, or whether there is, or could be, a need for legislative action in this area.
Given that there is currently no legal definition of AI, this thesis necessarily takes as its starting point the definition of AI as proposed by the European Commission in its proposal for a Regulation on AI1 (the Proposal), and as proposed by the European Council and then the European Parliament in their proposal for a Regulation on AI. According to the Proposal, "'artificial intelligence system' (AI system) means software that is developed with one or more of the techniques and approaches listed in Annex I and can, for a given set of human-defined objectives, generate outputs such as content, predictions, recommendations, or decisions influencing the environments they interact with"2. The definition proposed by the European Council defines an AI system as follows: "'artificial intelligence system' (AI system) means a system that is designed to operate with elements of autonomy and that, based on machine and/or human-provided data and inputs, infers how to achieve a given set of objectives using machine learning and/or logic- and knowledge based approaches, and produces system-generated outputs such as content (generative AI systems), predictions, recommendations or decisions, influencing the environments with which the AI system interacts"3. According to the definition proposed by the European Parliament, "artificial intelligence system' (AI system) means a machine-based system that is designed to operate with varying levels of autonomy and that can, for explicit or implicit objectives, generate outputs such as predictions, recommendations, or decisions, that influence physical or virtual environments"4. This thesis makes no attempt to analyse the three techniques and approaches listed in Annex I of the regulation as per the Proposal5, or the AI definitions proposed above, but instead takes as a point of reference the very broad definition of AI as proposed, according to which essentially all software is considered to be AI, which performs some kind of an operation on the basis of input data (with or without the help of machine learning), which has an "output result", i.e. the program is able to draw some conclusion, or possibly make a decision, on the basis of the data and information provided, which then it makes available to the user of the AI. It is worth noting that, from a data protection point of view, in my opinion, the amendment proposal containing the Parliament's negotiating position does not contain anything new which would not follow from the GDPR6.
This paper approaches the questions raised (i.e. what kind of caution is advised from a data protection perspective when using artificial intelligence, what are the relevant issues that arise in connection with the use of this type of modern technology) from the perspective of the provisions of the GDPR, including the principle of data protection by design and by default as laid down in Article 25 of the GDPR and the obligations arising therefrom, and, in particular, Article 22 of the GDPR, the "prohibition" stipulated in Article 227 and the requirements for automated decision-making.
2.. Privacy by design and by default
Where personal data is processed, whether or not using artificial intelligence, it is necessary to comply with the provisions of the GDPR, and the provisions of the GDPR will necessarily apply to the use of AI.
2.1 Article 25 of the GDPR
The principle of data protection by design in Article 25 (1) and the principle of data protection by default in Article 25 (2) of the GDPR impose an obligation on all controllers, regardless of the type of controller and the complexity of processing.
According to the principle of data protection by design, the data controller is obliged to establish and implement appropriate technical and organisational measures, both at the time of determining the method of data processing and during the processing, to ensure compliance with the principles and requirements of the GDPR and to provide effective and guaranteed protection of the rights of data subjects. The controller must do so taking into account the state of the art, the cost of implementation, the nature, scope, context and purposes of the processing and the varying likelihood and severity of the risk to the rights and freedoms of natural persons which the envisaged processing may pose.
The principle of data protection by default requires each controller to design and implement appropriate technical and organisational measures to ensure that personal data are processed only for the specific purposes for which they are intended by the controller. This necessity includes, inter alia, the number of personal data collected and the duration of processing.
If the essence of Article 25 (data protection by design and by default, "DPbDD") were to be defined in one sentence, one could say that the controller is obliged to consider before processing how the envisaged processing will comply with the principles of the GDPR and other requirements, e.g. on data security, and how it can maintain compliance in an effective manner (by modifying or adapting certain features of the processing, where appropriate) on an ongoing basis throughout the processing. In the words of the European Data Protection Board: "Controllers shall implement DPbDD before processing, and also continually at the time of processing, by regularly reviewing the effectiveness of the chosen measures and safeguards."8
By explicitly referring to the data protection principles, Article 25 reiterates the principles set out in Article 5 of the GDPR, out of which, in the case of data processing by way of using AI, the principles of fairness, transparency, purpose limitation, data minimisation and accuracy are paramount. (Of course, other principles must also be met, such as the principles of lawfulness, storage limitation, integrity and confidentiality, and accountability.) The GDPR does not prescribe the specific measures to be taken by the controller, so controllers must themselves plan and design the measures for the processing they intend to carry out in a timely manner to ensure ongoing compliance with the legal requirements.
When judging compliance, "ongoing efficiency" is key. The obligation in Article 25, like many other obligations in the GDPR, is not a static obligation, but a dynamic, ongoing requirement, and in order to comply with it, the GDPR provides guidance on the different aspects the controller needs to take into account9.
As regards the appropriate technical and organisational measures, Article 25 essentially refers to Articles 24 and 32. The former imposes an obligation on data controllers, essentially formulating the principle of data protection by design in a slightly different way, while the latter article imposes an obligation on both the controller and the processor to ensure adequate data security.
The risk-based approach is emphasised, given that a certain risk and its existence or non-existence are not static, and therefore the measures to be taken cannot be taken once and for all, as they are intended to address changing risks. The data controller (and also the data processor under Article 32) must assess these risks, their likelihood and severity in advance and, on the basis of the analysis, take appropriate measures to ensure compliance with data protection law. Although, data protection impact assessment (DPIA) is not specifically addressed in this paper, it is worth noting that, in the case of processing through the use of AI, particular attention should be paid if a DPIA needs to be prepared prior to the start of the processing, which in many cases could be the case due to the technological novelty artificial intelligence means10, or, even if an impact assessment is not necessary in a particular case, it is useful to document how the controller has come to this conclusion, and it may be worth testing the algorithm by feeding it with "test data" and then observing how modification of the data changes the functioning of the algorithm.
Adequate prior identification of the risks posed by the envisaged processing is essential, without which the controller (or processor) will not be able to take appropriate measures to address the risks, which will necessarily mean non-compliance.
Compliance with Article 25 of the GDPR is also important because when it comes to imposing a fine under the GDPR, authorities must take into account the extent of the controller's or processor's liability, the technical and organisational measures they have taken pursuant to Articles 25 and 3211.
2.2 AI from the perspective of Article 25
The above is applicable to data processing when using AI, but is particularly relevant when an organisation designs, tests and uses AI software, given that AI technology is particularly likely to exacerbate existing risks, create new risks or make existing risks more difficult to manage12.
The Guidelines of the European Data Protection Board emphasise that "[e]arly consideration of DPbDD is crucial for a successful implementation of the principles and protection of the rights of the data subjects. Moreover, from a cost-benefit perspective, it is also in controllers' interest to take DPbDD into account sooner rather than later, as it could be challenging and costly to make later changes to plans that have already been made and processing operations that have already been designed"13.
There are several main phases in the application of AI software, the first of which is design, followed by development, test use, followed by live use, real-life application (including continuous monitoring and supervision of the system's operation), and finally, the end of the software's use.
Specific GDPR provisions and the AI
The GDPR does not contain the term artificial intelligence, but its rules apply to any activity that involves the processing of personal data.
While the difficulties in aligning AI-based technology with the GDPR include compliance with the principles of transparency14, purpose limitation, data minimisation and accuracy, giving adequate information and ensuring data subjects' rights, it is certainly an exaggeration to say that the GDPR would make the use of AI impossible to implement, one could rather say that it is more time-consuming and costly to operate an AI model in a compliant way15.
It is often said that certain provisions of the GDPR are not concrete enough and do not provide sufficient guidance for entities considering data processing. Indeed, the GDPR contains "neutral" provisions (e.g. requiring the taking of "appropriate technical and organisational measures"), the application of which requires balancing between competing interests and, in the case of AI, the novelty and complexity of the technology and the extent of its impact on individuals make the issues even more pronounced16. However, there is no expectation of "zero tolerance" of risks, in which case no data could be managed, the focus is on striking the right balance17. In the following, I will mention the main points of conflicts in relation to the principles, legal bases and data subjects' rights that are particularly relevant in the context of an AI-based data processing.
3.1 The data protection principles and the AI
3.1.1 Transparency and fairness
These principles mean, on the one hand, that concise and comprehensible information must be provided to data subjects18, and on the other hand, that, for example, when profiling is carried out, the controller needs to apply technical and organisational measures (including mathematical and statistical procedures) to ensure that the factors that cause inaccuracies in the data are corrected and errors are minimised. Similarly, the controller must take into account factors that pose a risk to the rights of the data subject and ensure that the processing is non-discriminatory19. This is particularly relevant in the case of automated decision-making.
In the light of these principles, it should also be considered whether "re-identification" is possible, i.e. if, due to technological developments or even the circumstances of data processing, (e.g. machine learning) a person may subsequently become identifiable despite the fact that, for example, pseudonymisation was originally used.
Ensuring transparency can mean quite a challenge, especially in the case of machine learning20, when the author of the software does not know (even despite his/her best intentions) exactly why the machine is reaching a particular conclusion. In such a case, the applicability of the technology is questionable, since if the author does not know how the model works, nobody will, including the data subject.
If the data are collected directly from the data subjects, the information must be provided to the data subjects at the time of collection, before the model is taught or, respectively, before the model is applied, whereas if the data are not collected from the data subject, the information must be provided within a reasonable time, but not later than one month, or earlier if and when the data subject is contacted or when the data are transferred to another person.
The question may arise as to the applicability of Article 14 (5) b) of the GDPR, which provides that information need not be provided where "the provision of such information proves impossible or would involve a disproportionate effort, in particular for processing for... scientific or... research purposes or statistical purposes, subject to the conditions and safeguards referred to in Article 89 (1)...". However, the GDPR also requires in such a case that the controller takes appropriate measures to protect the rights of the data subject.
3.1.2 Purpose limitation
One feature of AI models is that they allow data collected for one purpose to be used for another. For example, the use of a "Like" button to express an opinion can be used to infer psychological characteristics, political opinions, consumer habits.
The GDPR provides that data may not be processed in a way incompatible with the original purposes, adding that further processing for purposes such as scientific research or statistical purposes is not incompatible with the original purpose (if it complies with Article 89 (1))21. The legislation also provides guidance on what criteria may help to determine whether the new purpose is compatible with the original purpose when processing is not based on the data subject's consent or on EU or Member State law22. The question is what can be considered compatible with the original purpose in the context of the use of AI.
For example, in the case where an AI model is trained to predict certain health risks to participating individuals based on the health data provided, it is highly unlikely that the individual concerned could not legitimately object if the database were then also used to offer insurance premiums to the individuals based on inferences drawn from the provided health data. The former use could clearly be beneficial for the data subject, as it could make it possible to detect a potential disease in time, but the latter could be extremely disadvantageous and discriminatory for the data subject.
The issue of data processing other than for the original purpose often arises, for example, in relation to large data sets, where newer and newer correlations may be revealed through data analysis. The question is where the line is/can be drawn between data processing that is compatible and incompatible with the original purpose. This will of course require a case-by-case analysis, but the GDPR states that for processing compatible with the original purpose of the processing, "no legal basis separate from that which allowed the collection of the personal data is required"23.
For example, if pseudonymised or encrypted data used to teach a model algorithm is used for statistical purposes (i.e. for a purpose other than the original purpose), it may be compatible with the original purpose. However, if the same data was used for profiling an individual, this may go beyond the limits of compatibility24.
3.1.3 Data minimisation
There seems to be an obvious contradiction between the principle of data minimisation and especially the processing of big data by artificial intelligence, since the latter is basically "interested" in getting access to as much data as possible in order to be able to draw conclusions and patterns (and learn from them).
It is important to note here that the words "as much data as possible" should not be taken literally, as the AI model is not interested in collecting data irrelevant (and/or inaccurate) for the purposes of the operation of the model, as this would undermine the reliability and thus the usability of the model. It is essential that the data are relevant and representative for the given purpose, otherwise it is certain that the model will not work properly and serious biases and discriminatory operation can be expected.
3.1.4 Accuracy
AI-based algorithms can only be as good as the data used to teach them are, thus, good quality data is essential for quality algorithms25. It is important to ensure that the input data is accurate from the moment the AI model is taught, especially when the data is used to make inferences or decisions about the data subject. Inaccurate data may in itself cause harm to the data subject, for example, if it is processed in a way that does not match its characteristics. At the same time, the accuracy of the output data should be ensured in such a way that it is based on accurate input data and the model performs the calculations correctly based on a "well-weighted" algorithm. It should be noted, however, that compliance with the principle of accuracy does not imply that the model must be statistically 100% accurate26. Some models are used to predict certain probabilities (e.g. susceptibility to illness, likelihood of late payment), so it is not a priori the case that the output of the model (the prediction) is definitely accurate (this fact must also be stated in the privacy policy).
If an AI system is used to draw conclusions about individuals, it must be ensured that the system is statistically sufficiently accurate for the given purpose. This does not mean that all inferences have to be correct, but the possibility that inferences may be wrong and the impact this may have on the decision made on the basis of the inference must be taken into account. Failure to do so may mean a breach of the principles of fairness, accuracy and data minimisation (the latter because incorrect personal data may not be adequate and relevant in relation to the purpose).
It is essential to address false assessments (false positives, false negatives). For example, in the case of a CV pre-screening model, if the system recommends inviting for an interview someone who does not match the characteristics of the candidate looked for, energy and time is wasted, whereas if the system does not recommend inviting for an interview someone who would be an ideal candidate, an opportunity may be lost for both the employer and the candidate.27
3.1.5 Storage limitation
In relation to this principle, it is worth noting that the storage of personal data for longer than necessary is allowed where the processing of personal data is carried out for purposes such as scientific research or statistical purposes (in accordance with Article 89 (1) and provided that appropriate security measures are taken to safeguard the rights of the data subjects).
3.2 The legal bases and the AI
In order to specify the legal basis, it is important to name the purpose of the processing. In the case of a machine learning model, a distinction needs to be made between the development-design-learning phase and the application phase of the model28. The learning of the model itself does not necessarily have a direct consequence for the data subject, but direct consequences should be expected when the model is applied to the data subject.
Out of the legal bases for the use of AI, the most relevant are the consent of the data subject and the legitimate interest of the controller29. The difficulty with the former is that it is not easy to fulfil all of its conditions, and it is difficult to obtain consent if there are a lot of data subjects, in addition, the data subject can withdraw his/her consent unconditionally at any time, a possibility that the controller should also take into account in advance.
Also, in terms of legitimate interest, a distinction should be made between when data is collected to teach an algorithm and when an algorithm is already in use and the data is an "input signal" that the model processes based on the algorithm (and, respectively, based on what it has learned) and provides a response. In the former case, the applicability of legitimate interest seems more obvious, provided, inter alia, that there is a legitimate purpose for the processing and that the necessary security measures are taken (e.g. pseudonymisation, encryption). In the latter case, it needs to be carefully examined if the legitimate interest of the controller can be used as legal basis or if rather the data subject's consent can be the legal basis or not even consent can be relied upon.
Although, legitimate interest is the most flexible legal basis for data processing, it is not always the most appropriate. For example, in cases where the use of personal data is not expected by the data subject or could cause unnecessary harm to the data subject, it may not be the appropriate legal basis. The use of legitimate interest also implies that the controller assumes an additional responsibility for the protection of the rights of data subjects and has the obligation to justify the necessity and proportionality of the processing30.
3.3 The rights of the data subjects and the AI
The GDPR declares in Article 1 that it "protects fundamental rights and freedoms of natural persons and in particular their right to the protection of personal data".
The focus of the GDPR is on the data subject, his/her protection, and each provision is designed to protect the data subject, thus, the rights of the data subject are closely linked to, among others, the principles, including the provisions of Article 25.
I will briefly raise below the issue of AI and certain rights of the data subject, with a focus on the "right" in Article 22, which is also the subject of the case law discussed in Sections 4.1 and 4.2 below.
3.3.1 Right of access
One of the questions that arises in relation to the right of access is to what extent the rights of others may constitute an obstacle to the exercise of this right, for example, whether the right to information may be denied on the grounds of the copyright of the developer of the AI model. According to the GDPR, the right of access "should not adversely affect the rights or freedoms of others, including trade secrets or intellectual property and in particular the copyright protecting the software", but it is also clear that not all information can be withheld from the data subject on this basis31.
3.3.2 Right to rectification
Ensuring this right with respect to the data used to teach an AI model typically does not affect the operation of the model in any substantive way, but it is questionable whether it is technically feasible to ensure this right. With regard to the personal data generated by the application of the model to the data subject, it is less likely to be technically unfeasible, adding that the right to rectification should be ensured both in the learning process and in the application of the AI.
To view the full article please click here.
Footnotes
1 Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules for artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts (Brussels, 21.4.2021 COM (2021) 206 final 2021/0106 (COD)).
2 Article 3(1) of the proposed Regulation.
3 See the Council's "general approach" of 25 November 2022. Council of the European Union, Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts - General approach (https://data.consilium.europa.eu/doc/document/ST-14954-2022-INIT/en/pdf) (accessed on 27 August 2023), p. 71.
4 See the document on the European Parliament's negotiating position adopted on 14 June 2023, which contains hundreds of proposals to amend the text of the proposed AI Regulation. European Parliament, Amendments adopted by the European Parliament on 14 June 2023 on the proposal for a regulation of the European Parliament and of the Council on laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts (COM (2021) 0206 - C9-0146/2021 - 2021/0106(COD)) (1) (https://www.europarl.europa.eu/doceo/document/TA-9-2023-0236_EN.html) (accessed on 27 August 2023).
5 These are as follows: (i) machine learning approaches, (ii) logical and knowledge-based approaches, and (iii) statistical approaches, Bayesian estimation, search and optimization methods.
6 The Parliament's proposal lists six principles that apply to all AI systems: (i) human oversight of the AI system, (ii) technical resilience and security, (iii) privacy and adequacy of data processing, (iv) transparency, (v) diversity, non-discrimination and fairness, (vi) social and environmental well-being, and would prescribe certain obligations depending on the type of AI (e.g. for certain AI systems considered to be high-risk, a fundamental rights impact assessment and consultation with public authorities would be required, or, in the case of generative AI systems, public disclosure of the use of copyrighted data used to teach the system would be expected).
7 In essence, as a general rule, the provision set out in Article 22 (1) of the GDPR prohibits decisions based solely on automated processing which would produce legal effects concerning the data subject or which would similarly significantly affect him/her.
8 European Data Protection Board Guidelines 4/2019 on data protection by design and by default under Article 25, version 2.0 (date of adoption: 20 October 2020), p. 4.
9 The state of the art, the cost of implementation, the nature, scope, context and purposes of the processing, and the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing.
10 Depending on the characteristics of the processing, the need to carry out an impact assessment can also be inferred from Article 25. It is also outside the scope of this paper, but it is worth briefly mentioning that a so-called fundamental rights impact assessment may also be required before the start of certain processing using artificial intelligence, on the basis of the forthcoming Artificial Intelligence Regulation.
11 Article 83 (2) (d) of the GDPR.
12 Information Commissioner's Office, Guidance of AI and data protection (https://ico.org.uk/media/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/guidance-on-ai-and-data-protection-2-0.pdf (accessed on 3 August 2023), p. 8).
13 European Data Protection Board Guidelines 4/2019 on data protection by design and by default under Article 25, version 2.0 (Date of adoption: 20 October 2020), point 36, p. 11.
14 Alan Turing said as early as the 1950s that a machine with the ability to learn will achieve its goals in ways that its creators and teachers could not foresee, in some cases without knowing the details of the machine's inner workings. This "black-box" phenomenon is specific to certain AI models.
15 In my view, one can agree with the approach in the proposed AI Regulation (shared by the European Council and the European Parliament), meaning that some AI-based processing should be prohibited in principle, given its highly invasive nature, while other AI-based processing as high-risk processing should be subject to specific additional conditions, but these are not the subject of this paper.
16 The impact of the General Data Protection Regulation (GDPR) on artificial intelligence, Study Panel for the Future of Science and Technology, EPRS | European Parliamentary Research Service, Scientific Foresight Unit (STOA), PE 641.530 - June 2020, Executive summary, p. III.
17 Information Commissioner's Office, op. cit., p. 13.
18 GDPR (58) Recital.
19 GDPR (71) Recital.
20 The Information Commissioner's Office, the UK's data protection watchdog also stresses that it can be a significant challenge to provide a clear explanation of how AI works. See Information Commissioner's Office, op. cit., p. 7.
21 Article 5 (1) (b) of the GDPR.
22 Article 6 (4) of the GDPR.
23 Recital (50) of the GDPR.
24 A particular issue is the processing of special category data, especially when the AI draws conclusions from non-special category data, which already qualify as special category data, such as when the algorithm draws conclusions from shopping habits and the analysis of "liked" content and comes to conclusions concerning psychological/health characteristics, sexual orientation, political opinions, which may be used to attempt to unconsciously influence the given person (for example, to sell certain products to them) or discriminate against them.
25 The "garbage-in, garbage-out" principle. European Union Agency for Fundamental Rights, Data quality and artificial intelligence - mitigating bias and error to protect fundamental rights, 2019 (https://fra.europa.eu/sites/default/files/fra_uploads/fra-2019-data-quality-and-ai_en.pdf (accessed 31 July 2023), p. 1).
26 Information Commissioner's Office, op. cit., 39 p.
27 Information Commissioner's Office, op. cit., p. 41.
28 For example, one develops a facial recognition software that can later be used for various purposes (e.g. authentication, tagging friends online).
29 Article 6 (1) (b) of the GDPR may be more of a legal basis when applying the model to the data subject (less so when teaching the model), but only if the service cannot be provided with less intrusive processing and the processing is objectively necessary, for example, for the performance of a contract with the data subject. The use of legal bases under Article 6 (1) (c), (d) and (e) of the GDPR is not per se excluded, but is typically unlikely. The legal bases under clause (d) can be essentially excluded for the teaching of an AI model, while at the same time the applicability of clause (d) could be considered for the application of the AI model to the data subject, possibly in the case of a given medical use.
30 This is demonstrated by the so-called legitimate interest assessment (also known as balancing test) and, where appropriate, the data protection impact assessment. If such a document is prepared during the development and training phase of the AI, it will need to be reviewed as and when the data processing purposes become more specific, are supplemented, or, respectively, as the case may be, another legal basis may be required.
31 Recital (63) of the GDPR.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.