Holly Herndon on Her AI Baby, Reanimating Tupac, and Extracting Voices

Jan 7, 2020 1:27 pm

Seven years ago, Holly Herndon—who holds a PhD from Stanford University’s Center for Computer Research in Music and Acoustics—was one of several musicians making a case for the laptop-as-instrument. Relying on software’s vocal manipulations in her debut album, Movement (2012), she called the device “the most personal instrument the world has ever seen.”¹ Three years later, with Platform, she explored what had become her more critical, uncertain relationship with technology.

Now she’s pitting artificial intelligence against itself, using digital engineering to expose its own limitations, while mining its capacities as a musical collaborator. Her aim is not to replace the human element, but to enhance it. For her third full-length album Proto, Herndon and her partner, Mathew Dryhurst, birthed an AI “baby,” a synthetic multivoiced singer they named Spawn (definitively a female), housed in a device that resembles a portable TV. They employed her—alongside a large flesh-and-blood chorus—to help create Proto’s thirteen aurally engrossing tracks.





AI can often feel like a black box. “There’s a barrier for entry,” Herndon told me. “I get that, but there are opinions I have now that I definitely would not have had just from reading the buzzy articles floating around the internet.” She says that the experience is not nearly as flawless as Apple or Amazon would like us to think, nor is the programming that enables it free of human bias.

Herndon came to these opinions while exposing Spawn to a voice model or “training set”—selected data used to teach neural networks—comprising samples of her own voice as well as others. To make the song “Godmother,” for example, the duo fed Spawn percussion tracks by the musician they consider Spawn’s godmother, the Indiana-based experimental electronic musician Jlin, and Spawn performed them using Herndon’s voice. Herndon then spliced and edited Spawn’s output to mix the final track.

Herndon, a Tennessee native now living in Berlin, is almost always categorized as a musician, and a musician only. She does indeed release albums and go on tour; she’s signed with the record labels 4AD and RVNG Intl.; and her YouTube channel hosts her many music videos. But those videos are equally at home in an art context: she’s collaborated with artists like Martine Syms and Trevor Paglen, and performed at institutions such as the Palais de Tokyo in Paris and the Barbican Center in London. The works, while often centering the singer herself, employ technologies that distort, glitch, and abstract the visual output, moving far beyond the symbolic and narrative clichés of more mainstream music videos. The video for “Eternal,” which was produced in part using face alignment, a type of computer vision that identifies geometric structures of human faces as they appear in digital imagery. The result, in this instance, is a constant flickering between Herndon’s visage and those of people whose facial structure an algorithm matches to hers. So the video, instead of simply aiding one’s understanding of the lyrics or heightening one’s experience of the music, raises key questions about digital surveillance, visual manipulation, and personal identity.



“It feels so limiting to be stuck in a musical context,” Herndon said when we met last October at her and Dryhurst’s apartment and home studio nestled in Schöneberg, a quiet residential neighborhood in west Berlin. “There’s the academic music discourse and the popular music discourse, which is largely youth- and advertising-fueled—you can’t go very deep. But we’re adults, we want to talk about real stuff.” As we walked into the dining room for our discussion, I noticed a large framed photo-work leaning against a white wall. The image, depicting a tornado, is in a pixelated style that looks familiar. I asked if the work is by Trevor Paglen.

 

HOLLY HERNDON Yes. Mat and I worked on audio for one of the works in Trevor’s 2017 exhibition “A Study of Invisible Images” at Metro Pictures in New York. Afterward, he sent us a PDF of AI-generated images to choose from and have printed and framed as a thank you. We went for this one, made by a neural network trying to conjure the image of a tornado. There is a connotation of power, which I’m not usually drawn to, but I found the picture really profound. I also sat for Trevor when he was making some Eigenfaces—a process in which you photograph someone going through their lexicon of facial expressions, and then a computer averages them. Facial recognition programs use these Eigenface composites to identify faces in the wild.

EMILY McDERMOTT I noticed that Paglen is also credited in some of your work, like the music video for “Eternal.”

HERNDON We have similar interests but very different executions, so there’s never any conflict. We often share ideas. Our old studio was around the corner from his in Kreuzberg, and we started resource-sharing a bit. We shot the video for “Eternal” in Trevor’s basement, using some of his Sight Machine cameras.² We’ve worked with his director of photography and some of his development team as well—one of them did some of the programming for the face alignment in “Eternal.”

McDERMOTT Let’s zoom out and look a little at your history. You grew up singing in a church choir in Johnson City, Tennessee, and later studied at Mills College in Oakland from 2008 to 2010, earning an MFA in electronic music. At what point did technology become so integral to your practice?

HERNDON During my high school exchange program in Berlin, I was introduced to electronic music, which is technology-driven in some ways. So I had little boxes—synthesizers, samplers, sequencers—that I was noodling around with, but I wasn’t obsessed. After I moved to the Bay Area, I was exposed to the less shiny and corporate, more DIY, side of technology. Everyone there is extremely tech literate, and the scene is extremely diverse. That’s when I took my interest in technology to a deeper level.

McDERMOTT You hadn’t used a computer in your musical practice until then?

HERNDON Not really. I didn’t grow up around much tech. When I got to college, I started taking programming classes. I had the opportunity to explore things that weren’t second nature to me. I mean, I used to be scared of computers. I worried that if I pressed the wrong button it would explode [laughs]. It was definitely a learning curve, but at Mills I figured out that technology is a language like any other, and there are degrees of proficiency.

McDERMOTT You began making music very solitarily with the computer, but now, in Proto, you’ve brought people back into the process with a full vocal ensemble. Tell me about that journey.

HERNDON It was a long journey. The trajectory makes sense looking back, but it’s hard to rewind because our understanding of technology has changed so much. If I say now that, when I started out, it was controversial to play your laptop on stage, it seems absurd. But it really was! People would get angry about it, especially in the academy—I was also working on my PhD at Stanford from 2011 until 2019. Yet I was really passionate about the capabilities of this machine and wanted to make it a more relatable performance instrument. So for Movement, I decided to use a processed voice, because the audience could see a live performance happening onstage and understand what’s going on, even though what came out was totally different from what was going in.

With Platform, I was thinking about the post-internet scenario we were dealing with at the time. I worked with and met many of my collaborators on that album online. It was amazing that I could collaborate with people all over the place, and much of the writing for the album was about the internet; using the internet as a medium was really important to me.

McDERMOTT That carries through your entire practice: you use the actual technology in question to produce music, rather than just discuss it through lyrics.

HERNDON A lot of music suffers from this. People will write a song about topic A, but I want to come up with a production method that deals with topic A. The music can speak explicitly about it, but I don’t want to rely on lyrics or language. I want the process to be dealing with the object of critique.

McDERMOTT How did your collaborations begin to move from the digital sphere to the physical?

HERNDON On one hand, it was a luxury: I reached a point where I could afford to get to know people and record with them. On the other hand, I had come to miss performing with people in physical space. We toured Platform for two years and played a lot of electronic music festivals. At the time, there was a trend toward everything being automated and the performance being a mechanized choreography of lights and projections. I wanted to understand where the human fits in this highly mediated landscape, though not as in, “let’s go start an acoustic folk band”—that’s absurd. I don’t want to look back. But if we have these tools, where does the human fit in, and what’s the point? Is music just entertainment, is it our joy onstage? Is it a kind of communion we’re having together? I was asking myself these fundamental questions, and I missed the joy of singing with other people as well as the joy of the audience feeling it. When Colin Self joined the Platform tour about a year in, I was reminded how fun it is to sing with someone on stage. Now we’re even involving the audience, asking them to sing along with us on certain things.

McDERMOTT But you record those communal singing moments, and then turn around and use them as training sets for Spawn.

HERNDON Exactly, and that was a parallel conversation to developing the ensemble. Mat and I got a grant from [the German fellowship program] #bebeethoven, so we were able to work with [web developer and coder] Jules LaPlace, buy a souped-up gaming PC, assemble Spawn, and start experimenting.

McDERMOTT At first glance, building an AI and collaborating with humans seem like two divergent paths. How did these things develop in parallel and eventually converge?

HERNDON Well, AI is just us. AI is human labor obfuscated through a terminology called AI, and our goal is to use technology to allow us to be more human together. Forget about the AI for a minute and think about how to make the laptop an organizational brain for upward of ten people to perform around—that’s a challenge but it’s never about the laptop. The laptop has great capabilities, but it’s always about the communication between the people.

Also, things didn’t start out as this grand vision. Mat and I wanted to perform with people in real time, but still, we are both nerds interested in nerdy topics. We were hearing all of this AI stuff and were like, “I need to deal with it in order to have an opinion on it.” Through research, I learned a lot of people use existing score material as a training set to create works based on that style, which is basically a statistical analysis of a composer or genre that enables you to make those types of sounds forever. This is so problematic in so many ways in my mind; you get yourself into this aesthetic cul-de-sac, where you’re only making decisions based on those that were made before. To me, that’s not what music is. That process doesn’t make it alive, it makes it a historical reenactment.

McDERMOTT How did you approach things differently?

HERNDON From a very early stage, we decided we didn’t want to use MIDI [Musical Instrument Digital Interface], which translates a played musical passage into pitch, note, length, and rhythm so you can run a statistical analysis on the musical data. Once we decided we did not want to take a training set from someone else, we had to think about where it would come from. Eventually we started creating our own training sets, first with samples of my voice and then Mat’s, and then we started training with the ensemble.

When we were thinking about vocal styles for the ensemble, I was really attracted to global varieties of vocal harmony. I see singing as a technology and if you’re looking at the lineage of technological development from the earliest eras, there are many theories that say dissonant singing was used in communities around the world to ward off predators. Interrogating artificial intelligence entails thinking about and questioning human intelligence, cognition, and free will, as well as where all these concepts come from. You can see AI as a step along this evolutionary track, rather than something geniuses in Silicon Valley came up with; it is a project of human intelligence that’s been forming for a really long time—that’s something Reza Negarestani writes about in Intelligence and Spirit [2018]. We’re constantly redefining what it means to be human, what it means to be natural, or what nature is.

McDERMOTT First, you set out to make the laptop relatable to people as a performance instrument. It seems like you’re doing the same now with Spawn: showing that AI isn’t some sci-fi mystery, but it’s something built by humans and part of our daily experience—if you ask Siri to add something to your calendar, you’re using AI; every interaction helps train Siri. Or, when call you Apple’s customer service or send a message to a help center online, you’re almost always communicating with an AI bot.

HERNDON That’s definitely something we’re trying to do—without being too didactic. I never want to lose the poetry or whatever it is about art, but I like to lift up the curtain and show that people are being sold a dishonest image of something.

McDERMOTT Maybe now we can talk about some of the opinions you’ve formed since starting to work with AI. I’m guessing that this marketing of a dishonest image has something to do with them.

HERNDON My most obvious and basic opinion is one that’s very popular right now: AI has all the biases of the people who are designing it. The 2016 election was a big wake-up moment for many people, who realized, “Oh, we’re being manipulated.” Yeah, we are. With Platform, I was thinking about the politics of platforms themselves—the ways that they’re manipulating us, and how they change our behavior. Shoshana Zuboff, an economist at Harvard, wrote an opus that perfectly ties Platform and Proto: it’s about how artificial intelligence amplifies some of the issues that are already built into the infrastructure of what socioeconomics scholar Nick Srnicek terms platform capitalism [a digital mode of business propelled by firms like Google, Amazon, AirBnB, and Uber]. Zuboff demonstrates that platform capitalism relies on AI to grow. One reason we chose the name Proto is to emphasize how algorithms define protocols. The rules set up on a very baseline infrastructural level will affect all the interactions that happen on top of it.

McDERMOTT Right, you’ve looked a lot at the early days of the web.

HERNDON Yes, and baked into the internet is this curatorial, hyper-individualist model of “I can take anything, put it on my own digital real estate, and monetize it—regardless of where it came from.” At the very baseline level of protocol—the ability to hyperlink, copy-paste, and other such things—is a radical decontextualization, a disregard for where shit came from and who worked on it. The system didn’t have to be designed that way, but it was; it’s built in a way that serves the advertising industry, which just wants product names and images to spread. Now the internet is a giant mall, so of course you’re going to have issues with public utilities like news and information. As we build the next AI, the next whatever, we need to ask, as we create the most fundamental layer, “What are our values?” Usually, by the time we figure out that we don’t like what’s going on, the protocols have already been developed. So it’s important to think about this now.

McDERMOTT But aren’t we already somewhat beyond the protocol level? AIs are already being trained on living humans’ voices, without any attribution or repercussions.

HERNDON Well, yes, and this has made me think a lot about our human archive and what it means to create something, release it, and share it. In her book Steal This Music: How Intellectual Property Law Affects Musical Creativity, Joanna Demers says that, in the ’70s, Miles Davis criticized hip-hop as “artistic necrophilia” for relying on the past artistic decisions in our shared archive. I disagree with him, but I love that phrase. With AI, you could have a Tupac hologram and create an entirely new Tupac catalogue using his voice model, and it could be something he would’ve never opted for. What does it say about us as a society that we keep reanimating the dead for our entertainment?

McDERMOTT It’s not just that voices are being used without their owners’ consent, but also that the owners’ names are being lost. If an AI is trained on a Tupac voice model and it starts to make music, then Tupac is eventually going to be lost and it’ll just be AI sounds made in what you earlier called an “aesthetic cul-de-sac.” We as humans have a responsibility to give credit where credit is due.

HERNDON Exactly. Music has an especially bad track record when it comes to attribution. If you look at sampling, for example, many people have been sampled and not been paid. There are some legal restrictions on sampling, but there are fewer when it comes to AI, and I think it’s going to get really ugly really soon. Before working with AI, I didn’t think much about the sovereignty of an individual’s voice and what that means politically or aesthetically, but it’s a huge question. You can deepfake so much so easily now. We have some big questions on our hands.

McDERMOTT An article I recently read posed the question of whether or not royalties are owed if an AI creates music based on someone’s voice.

HERNDON If it’s a voice model made from my voice—even if it’s my publicly accessible, recorded voice—that voice model is my voice. The human whose voice trained it should be paid for it, even though I’m sure there’s some legal work-around. But I think the very concept of copyright is going to shift.

McDERMOTT It wasn’t written for the digital sphere.

HERNDON No, it really wasn’t, and there is a long history of legal cases about vocal sovereignty. Bette Midler and Tom Waits both won cases when sing-a-likes were hired for advertisement purposes by corporations. So there is legal precedent for being able to protect your voice, but we still haven’t implemented that digitally. It’s also more complicated because if you think about someone like Elvis Presley, his vocal style came from a long list of largely African American singers. The history of pop music is one of emulation. The voice isn’t necessarily individual; it belongs to a community, to a culture, to a society. So “what is an individual’s voice?” and “how do you copyright it?” are very sticky questions. I don’t have the answers, but . . . Our society isn’t structured to give everyone credit, let alone pay them. It’s this curatorial hell.

McDERMOTT The imbalance between an individual’s rights and a corporation’s is growing increasingly problematic. That makes me think about the role machines play in exploiting human laborers.

HERNDON I don’t think machines have feelings or sentience—there is no humanlike artificial general intelligence right now—so you can’t exploit machines, but you can exploit the many people who are hidden by the machine. The machine helps exploit people and make the process seem clean and seamless.

McDERMOTT Which is something you seem to be working against with Spawn. Even though the machine is heavily involved, you always list your collaborators’ names and speak about the project using “we” instead of “I.”

HERNDON Yes, but even those gestures have been difficult. I tried to start working like that for Platform, but when I would try to talk about the book I was reading or a collaborator, people wanted to focus on my whatever. To a degree, I understand: I am the conduit, this central person, and at the end of the day I do sit there and spend a bajillion hours on my albums, but we need to find a way to celebrate someone’s achievements while still acknowledging the contributions of others.

McDERMOTT Going back to getting your hands dirty with AI via Spawn, what have you come to understand about its limitations and capabilities?

HERNDON AI is not that smart, it’s very low fidelity, it’s not real time, it is very slow and unwieldy. Spawn can take more than 24 hours to process someone’s vocal input. On the other hand, it has some unique capabilities that are pretty exciting-slash-scary. The AI can extract the logic of something outside its operator’s own logic and then re-create it. This is entirely new for computer music. When you’re dealing with algorithmic music, you’re designing a board game or a systematic logic, often with lots of randomness thrown in. In a Rube Goldberg way, you can then let the ball go and it makes all these decisions; you might not fully understand them all, even though you designed the process. With an AI, a neural network, you can take someone else’s Rube Goldberg, then the AI reads it and creates another Rube Goldberg, building on the same logic—and operator or designer can have no idea how that was set up or what the components are. The AI can literally extract someone’s voice—which is new, amazing, and terrifying. It’s a very powerful tool and we need to actively decide the rules around it.

McDERMOTT This goes back to the idea of Tupac’s hologram and the aesthetic cul-de-sac.

HERNDON Right. I think a lot of people say, “let me train an AI with my past records and have it make a new record for me,” but then you’re just going to have a copy of what you’ve already done. The point is to be constantly developing and updating yourself as a human. You can use AI as a collaborator or tool, but not as a replacement. Why would I want to replace myself? This is also a problem with algorithmic recommendation systems. If my sixteen-year-old self’s tastes had been catered to fully by an algorithm on Spotify, I would be listening to the shittiest music. I needed interventions, that’s how you grow and get exposed to different things. We’re coddling people with algorithms. It’s depraved.

 

1 Mark Baynham, “Speaking in code: Holly Herndon explains why the laptop is the most personal instrument the world has ever known,” FACT magazine, Nov. 15, 2012, www.factmag.com.2 For his 2018 Sight Machine performance at the Smithsonian American Art Museum in Washington, D.C., Paglen connected cameras to AI-programmed computers. The system “watched” the Kronos Quartet as they played onstage and responded with projected visual interpretations.

 

This article appears under the title “In the Studio: Holly Herndon” in the January 2020 issue, pp. 64–71.