ICO's third call for evidence on generative AI
Background
On 23 April 2024, the UK Information Commissioner's Office ("ICO") announced its third call for evidence as part of its consultation series examining how data protection law applies to generative AI.
Previous consultations in this series have focused on:
- lawful basis for web scraping to train generative AI models; and
- purpose limitation in the generative AI lifecycle.
Please see here for our insights on the first consultation.
The focus of this third consultation is on how the accuracy principle applies to the outputs produced by generative AI models, and how the accuracy of the data used for training affects those outputs.
The ICO is seeking views from various people ranging from developers and users of generative AI to legal advisors and consultants working in this area. The results of this consultation will be used to shape the ICO's policy position on generative AI. This call for evidence is open until 10 May 2024 and can be responded to via this link.
In this blog post, we explore the ICO's analysis and the input it is seeking.
The accuracy principle and statistical accuracy
The accuracy principle refers to the legal principle of accuracy under the UK GDPR, which requires that personal data is "accurate and, where necessary, kept up to date". Where personal data is inaccurate, organisations are expected to take reasonable steps to erase or rectify the data "without delay".
This principle is important as it prevents the spread of inaccurate information about individuals and ensures that decisions concerning them are not influenced by erroneous data. The ICO distinguishes between the accuracy principle (under the UK GDPR) and statistical accuracy, which refers to the accuracy of an AI system (i.e., how often an AI system guesses the correct answer, measured against correctly labelled test data).
Should the results produced by generative AI be accurate?
A developer and deployer of a generative AI model may fail to comply with the accuracy principle if inaccurate training data leads to inaccurate outcomes or decisions, which have adverse consequences for individuals, such as reputational damage, economic harm or spread of misinformation. While the results produced by generative AI models don't necessarily need to be 100% statistically accurate to comply with the accuracy principle, it is important to note that high statistical accuracy will be needed for models that are used to make decisions about people.
The ICO notes that personal data does not need to be kept up to date in all circumstances, and accuracy is closely tied to the purpose of processing. To determine whether outputs produced need to be accurate, the specific purpose for which a generative AI model will be used should be established. For example, a tool assisting game designers to create storylines will not require accurate outputs to be produced. This is because the purpose of the tool is purely creative and will not be used to make decisions about individuals. However, a tool used by a company to summarise customer complaints should produce accurate results as the purpose is to ensure the summary accurately reflects the complaints (statistical accuracy) and contains correct information about the customer (accuracy principle).
Since generative AI models can be used for different purposes, it is also important to make it clear whether results produced are statistically accurate or not, and that any limitations regarding their accuracy are clearly explained to avoid misuse or overreliance on their results.
Tips for complying with the accuracy principle
- Ensure clear communication between developers, deployers, and end-users regarding the intended purpose and level of accuracy of the model. Developers should assess and communicate the risk and impact of inaccurate outputs, known as "hallucinations," which can occur due to the probabilistic nature of generative AI models.
- Monitor and ensure appropriate use cases for generative AI models, especially in consumer-facing services. Provide clear information about the statistical accuracy of the application, easily understandable information about appropriate usage, and monitor user-generated content.
- Label outputs as generated by AI or not factually accurate. This could involve embedding metadata in the output or making unnoticeable alterations to record its origin.
- Provide information about the reliability of the output, for example, by using confidence scores.
- Curate training data accordingly to ensure sufficient accuracy for the intended purpose.
- Set clear expectations for users regarding the accuracy of the output and conduct research to ensure users interact with the model appropriately.
- Regularly monitor compliance with data protection accuracy obligations, ensuring that the usage is appropriate for the level of accuracy that the model can provide.
What is the ICO requesting?
The ICO is seeking views on the ways in which decisions taken at different stages of the generative AI lifecycle affect the accuracy of the outputs. It is interested in hearing what organisations think about how the relationship between inaccurate training data and inaccurate model outputs should be assessed, measured, and documented.