|
minute read
Meemoo, the Flemish Institute for Archives, plays a pivotal role in preserving and disseminating the cultural heritage of this region in Belgium. Working with its cultural sector partners, meemoo digitises diverse heritage items, from newspapers and glass plates to audiovisual materials and Flemish masterpieces.
Part of meemoo’s mission is to enhance the accessibility and usability of these digitised items by enriching them with metadata labels. Faced with vast volumes of material, meemoo knew such an undertaking would only be feasible with the help of AI. And that’s why they turned to Sopra Steria’s specialist AI and data experts.
With Sopra Steria’s support, they embarked on their first large-scale AI initiative in 2023. This first project focused on using AI to apply metadata to accurately label millions of people, places, and organisations across 170,000 hours of video and audio content.
To learn more, we spoke to meemoo's archiving manager, Matthias Priem, and Sopra Steria's AI and Data Science Director, Kimberly Hermans.
Could you provide an overview of the metadata project and its goals?
Matthias: In recent years, we have digitised and archived a vast mass of audiovisual material at meemoo. These digital archives are often difficult to search due to a lack of descriptions (or metadata labels), but adding these manually is extremely time-consuming.
So, we aimed to apply automatic, AI-driven descriptions to all the audiovisual materials while maintaining the highest ethical and privacy standards. The metadata lets us establish links between names, recognised entities, places and other archives or external sources.
Can you give an example of how this works?
Matthias: For instance, if a person appears in videos of both parliament archives and museum archives, such as the opening of an exhibition, for example, we will use the same label to identify them. Using Wikidata links, we can further enrich the content by linking to that person's political party or birthplace.
How did meemoo navigate the ethical dimensions of such a project?
Matthias: Being a government-funded organisation operating in the ethically charged cultural heritage space, we sought legal advice and established checklists to identify and mitigate risks. Additionally, we formed an ethics committee and conducted workshops to ensure the exploration of ethical questions from various professional perspectives.
What about legal issues?
Matthias: We made sure the work is compliant to the General Data Protection Regulation (GDPR) and found the Data Protection Impact Assessment (DPIA) a useful touchstone to check this against. Next we investigated whether there can be bias in the used models and what we can or cannot do with regards to face recognition. All involved parties contributed to this process: legal experts, people that would be recognised in video, archives, and legal and ethical consultants. Based on the work done during the project, we created a practical legal and ethical framework to help archivists with the everyday use of AI, such as determining whether a person can be added to the reference set for face recognition.
Why was AI necessary in this project. How does your expertise fit in?
Kimberly: The sheer scale of the task made AI necessary. Within this project's scope alone, there were 170,000 hours across 127 archives. It would have taken a human over 19 years to listen to all the audio, let alone transcribe and annotate it. So, we used speech recognition to convert spoken words into computer-readable, time-coded text. We employed entity recognition on the transcripts to tag individuals, locations, and organisations. And we applied facial recognition technology to tag the faces of specific people in the public domain.
How did Sopra Steria assist meemoo on the technical side?
Kimberly: Our team helped by researching and assessing commercially available AI tools that met the stringent quality standards required. We facilitated discussions with vendors and built the necessary pipelines to connect meemoo's data to the platforms. Regarding face recognition, our team created a custom-made tool based on open-source models because there was no appropriate solution on the market. However, the challenge wasn't just in the algorithms; the hard part was integrating them into an end-to-end solution that worked in the specific context of cultural heritage.
This sector has high ethical standards, ranging from complex copyright and digital rights issues to interoperability and scalability requirements. This necessitated collaboration with experts from various fields, including archivists, historians, ethics experts, and more. Everyone's expertise was vital in making the AI solution successful.
How did Sopra Steria address potential biases in the face detection system?
Kimberly: During working groups, feedback highlighted the need to investigate the risks of potential biases. However, instead of exploring every possible bias, we took a practical approach. Our analysis indicated that, in this case, facial detection and recognition were most likely to introduce unwanted biases. Therefore, we precisely defined what aspects needed validation. The group selected 20 people from the reference set, and we validated whether the system correctly detected and recognised those individuals across 30 videos.
Matthias, what were the outcomes of the project?
Matthias: The results were outstanding. We identified 3.3 million faces and around 6.5 million named entities and transcribed 560,000,000 words in different languages. In other words, we accomplished what would have taken humans decades in just a few months. Additionally, we developed an interface with restricted access for partner organisations, creating a secure space to explore the data and its possibilities. Our partners are enthusiastic about the outcome, too. In 2023 alone, when the project was live, we received three prestigious awards. We will set up further AI projects to enrich Flemish heritage in the coming years.