Researchers at Lund University are developing a forensic speech comparison using speech therapy, AI, mathematics and machine learning. The method will help police analyze audio recordings in criminal investigations.
Like fingerprints and DNA, the voice carries unique characteristics that can be linked to individuals. Speech and voice are influenced by several factors, such as the size of the vocal cords, the shape of the oral cavity, language use and breathing. While most people can perceive the gender, age or mood of a speaker, it takes specialist knowledge to objectively analyze the unique patterns of the voice – an area in which speech therapists are experts.
The police turned to Lund University for help analyzing audio recordings in an investigation. The request led to the development of forensic speech comparison as a method of evidence gathering.
The police often handle audio recordings where the speaker is known, but also recordings where the purpose is to confirm or exclude a suspect.
– What we do at the moment is to have three assessors, speech therapists, analyze the speech, voice and language in the recordings in order to compare them. We listen for several factors, such as how the person in question produces their voice, articulates, seems to move their tongue and lips, says Susanna Whitling, a speech therapist and researcher at Lund University, in a press release.
Both larger datasets and cutting-edge analysis
The number of requests from the police has increased, making it difficult for analysts to keep up with all the recordings. To handle larger data sets, researchers have developed AI-based methods that can identify relevant audio files, which are then analyzed by experts.
– By combining traditional speech therapy perceptual assessment of speech voice and language with machine learning, we want to make it possible to both scan large amounts of data and offer cutting-edge analysis. Based on the hits that the AI then extracts, experts can make a professional assessment, explains Whitling.
The researchers are also collaborating with Andreas Jakobsson, a professor of mathematical statistics, to develop specialized software. The vision is to have an accurate and reliable speech comparison.
– We speech therapists can do perceptual assessment and examine the probability that two recordings contain the same person’s speech, voice and language. When adding the development of specialized software for so-called acoustic analysis such as voice frequency, intensity and temporal variations, we collaborate with experts in signal processing and machine learning.