An Algorithm Generated Eerily Accurate Portraits Based Only On Someone’s Voice, by Melanie Ehrenkranz.
MIT researchers published a paper last month called Speech2Face: Learning the Face Behind a Voice which explores how an algorithm can generate a face based on a short audio recording of that person. It’s not an exact depiction of the speaker, but based on images in the paper, the system was able to create an image of a front-facing face with a neutral expression with accurate gender, race, and age.
The researchers trained the deep neural network on millions of educational YouTube clips with over 100,000 different speakers, according to the paper. While the researchers note that their method doesn’t generate exact images of a person based on these short audio clips, the examples shown in the study do indicate that the resulting portraits eerily resemble what the person actually looks.