

“We stretch those natural points and nuances and inflections.” “We know the difference between sarcasm and sincerity, and the tiny little clues in sound,” says John Flynn, Sonantic co-founder and CTO.

They work with voice actors to get a wide range of training data – several hours of the actors running through different lines, with different emotional tones. So it doesn’t sound as energetic as Homer might.”īritish startup Sonantic has developed a way of bringing that emotional range to AI voices. “If the model hasn’t been exposed to those quite wide ranges of emotion it can’t create it from scratch. “It does depend on the training data,” McSmythurs says. The outputs are recognisably Homer, but they sound a little emotionally flat, as if he’s reading out something that he doesn’t really understand the meaning of. “It focuses in on what makes a Homer voice a Homer voice, and the different frequencies,” he says.Īfter that, it’s a matter of asking the model to generate multiple takes – each one will vary slightly – and choosing the best one for your purposes. When he wants to make a new voice, he tunes the model further with two or three hours of new data of that particular person speaking, along with a text transcript. McSmythurs built a generic AI model that can turn any text into audio speech in English.
