What’s the cost of automating the human voice?
by Will Vunderink, editor-in-chief; illustration by Lucy Holtsnider, staff artist
There are few things that truly bother me. And really, with a life as carefree as that of the average Colorado College student (let’s not kid ourselves), I hardly have the right to complain about my petty annoyances or irritations. But there are some things I’m exposed to that I can’t help but hate with a deep, deep passion. Right at the top of that short list sits Auto-Tune, the pitch-correction software created in 1996 by Antares Audio Technologies. Whence and wherefore, I ask, this infernal machine?
Whether consciously or not, you’ve all heard the software in action. It’s used by the majority of recording artists on major record labels—most of what you hear on the radio. Usually, it discreetly shifts the pitch of a singer’s voice to the perfect note. But in some cases—beginning with Cher’s 1998 single “Believe” and then, perhaps more egregiously, in songs by modern R&B/hip-hop artists of the 2000s like T-Pain and Kanye West—Auto-Tune has pushed the human voice to its extreme, lending it an unnatural sound by abruptly and impossibly jumping between notes. Auto-Tune functions in two ways—to make singers sound like bubbly robots and to make talentless vocalists sound like they can actually sing. Both achieve the same result: automating the human voice and taking the most human element of music—its inherent imperfections—out of the equation.
That result is precisely what bothers me so deeply. The act of singing conveys and evokes emotion in a way that no other kind of expression—writing, painting, dance—can. The human voice is our most fundamental mode of expression, and our most basic instrument. The very imperfections that Auto-Tune eradicates are what make the voice so powerful and relatable.
Auto-Tune arose from the most improbable of origins: the oil industry. In a piece from June 2008, Sasha Frere-Jones, the New Yorker’s pop music critic, explains that Andy Hildebrand (Auto-Tune’s inventor) spent eighteen years as a seismic data explorer. He mapped the subsurface of the earth by sending sound waves into the ground and then recording their reflections with a device called a geophone. This technique, called autocorrelation, turned out to be just as effective at detecting pitch as it was at detecting oil.
In the hands of recording engineers and producers everywhere, Auto-Tune has detected, corrected, and altered the pitch of so many voices that using the software has become the industry standard. (It has even moved into the realm of live performances, although lip-synching is still far more common.)
A page on the Antares website explains how the software ensures that a vocalist hits the right notes: “Automatic Mode instantaneously detects the pitch of the input, identifies the closest pitch in a user-specified scale (including minor, major, chromatic and 26 historical and microtonal scales), and corrects the input pitch to match the scale pitch.” A recording engineer needs only to plug in the scale that corresponds to a given song, choose the retune speed (the speed at which Auto-Tune corrects the note) and the singer can essentially sing (or wail or moan) whatever he or she wants.
In a 2009 TIME article on the subject, writer Josh Tyrangiel quotes a “Grammy-winning recording engineer” as saying, “I’ve had Auto-Tune save vocals on everything from Britney Spears to Bollywood cast albums. And every singer now presumes that you’ll just run their voice through the box.” This mindset raises the issue of the authenticity of modern pop recordings—to what extent can an Auto-Tuned vocalist be credited for his or her performance, when most of the recorded notes are only hit through a machine? And the rampant use of Auto-Tune raises another unfortunate prospect: the homogenization of music.
In the same TIME article, Tyrangiel quotes Rick Rubin, one of the most successful and sought-after rock producers ever: “Right now, if you listen to pop, everything is in perfect pitch, perfect time and perfect tune. That’s how ubiquitous Auto-Tune is.” Noting the same trend, Tyrangiel laments the fact that “pop is in a pretty serious lull at the moment.” In the effort to make vocalists stand out as gifted, pitch-perfect singers, the opposite effect emerges. The unique qualities of a singer’s voice—its timbre, its particular imperfections, its emotional expressivity—are smoothed over. Everyone sounds like everyone else.
But Auto-Tune is not restricted to the pop music world; it has shown up in a couple of surprisingly non-mainstream places as well. Take, for instance, Bon Iver’s For Emma, Forever Ago—an album created about as far from the mainstream as you can get: recorded in a small cabin in rural Wisconsin one winter and initially self-released. As “The Wolves (Act I And II)” begins building, getting louder and more chaotic halfway through, a number of voices harmonize wordlessly—some untreated and many Auto-Tuned. It’s a shock in the context of Bon Iver’s mostly acoustic, intimate neo-folk, and these couple of minutes stick out like a sore thumb. (Bon Iver’s following album, the Blood Bank EP, features an almost unlistenable song consisting entirely of harmonizing Auto-Tuned vocals. It pains me to think about it.)
More recently, Sufjan Stevens, too, has succumbed to the trend. On “Impossible Soul,” the twenty-five-and-a-half-minute closing song from The Age of Adz, Auto-Tuned vocals appear eleven minutes in, taking a promising (if overstuffed) song out of the realm of seriousness. On an album that seems designed as a deliberate refutation of Sufjan’s delicate voice—with heavy reverb and multi-tracking, vocals strained beyond their natural range, and a disorienting slap-back echo—the argument for making use of the ultimate vocal manipulator is clearer. But that doesn’t make it any more bearable. In the music of both Sufjan Stevens and Bon Iver, the effect stands out first and foremost as a distraction.
Sufjan has given perhaps the lamest defense of Auto-Tune that I’ve ever come across. In a recent interview with eyeweekly.com, the interviewer butters him up, saying that his use of Auto-Tune “sounds transcendent, like you’re trying to make your voice hit the next level.” Sufjan responds: “Well, because Auto-Tune corrects the pitch, there’s a kind of perfection of tone, and I think that perfection of tone is about the harmony of the universe, you know?” No, Sufjan, I do not know.
He maintains that the software’s “deeper meaning” is that “it’s pushing the human voice into perfect pitch.” But that’s not the software’s deeper meaning—that is its obvious, surface function. Its deeper significance lies in the fact that it artificially enhances one of the human race’s longest-standing, most powerful, raw, beautiful and fundamental ways of expressing itself. Just because certain vocals have been digitally altered to sound as if they were sung perfectly does not mean the human voice has been perfected, nor that it will ever reach perfection. And why should we want it to?
The alternative use of the software that has become so popular—the so-called “Cher effect”—is no less disconcerting, and is a hell of a lot more irritating than its intended use. When the “retune speed” is set to zero, as it is in T-Pain, Auto-Tune instantly finds the “correct” note closest to the sung pitch and adjusts accordingly. The reason vocals treated this way sound so robotic is that the effect completely eliminates the human voice’s natural slide between notes. In conversation with Tyrangiel, creator Hildebrand says, “I never figured anyone in their right mind would want to do that.”
But to Frere-Jones at the New Yorker, “the Auto-Tuned T-Pain is rarely a mopey presence. In his hands, the program becomes pop music’s rose-colored glasses, or a balloon’s worth of helium inhaled . . . His vocal hooks sound delirious, not desperate.” The sound is certainly delirious, but in a far more nauseating than delirious way.
And from here, the usually reliable Frere-Jones gets carried away. “Aren’t some of the most entertaining and fruitful sounds in pop—distortion, whammy bars, scratching—the result of glorious abuse of the tools? [ . . . ] it’s hard to see how the invisible use of tools could imply an inauthentic product, as if a layer of manipulation were standing between the audience and an unsullied object.” I see the point he’s trying to make, but he’s missing a crucial distinction between “distortion, whammy bars, scratching” and Auto-Tune. Distorting a guitar changes its sound, but the guitarist must still play the notes you hear. Same goes for the whammy bar, which briefly changes the pitch of picked or strummed notes at the guitarist’s discretion, raising or lowering it as much as he or she wants. Scratching drastically changes the sound of a record, but again, it’s entirely manipulated by the DJ. In short, musicians, not machines, control these embellishments. Auto-Tune, on the other hand, takes the element of control—and musicianship—out of the act of singing.
Frere-Jones tries to give weight to his argument by writing that “even a purely live recording is a distortion and paraphrasing of an acoustic event.” Well, yes, a live recording is an unnatural replication of that event, but everything you hear in that recording was actually played and sung by the musicians (if the performance wasn’t a lip-synched pantomime to pre-recorded tracks, that is). The recording of the live show doesn’t correct the pitch of guitar notes that are out of tune.
As Frere-Jones concludes (while discussing tactics used by George Martin, the Beatles’ brilliant producer), it’s true that Auto-Tune is the newest generation (taken to an extreme) of “the older, more traditional tricks of tape-splicing, double-tracking the voice, and adding a little reverb.” But you can splice tape, double-, triple-, or quadruple-track vocal lines, and add reverb to your heart’s content—you’re still hearing a real human voice hitting the notes.
And what kind of world would we live in if popular music’s most legendary voices had been Auto-Tuned all along? John Lennon (not to mention George, Paul, and I’m definitely looking at you, Ringo), David Byrne, Al Green, Mick Jagger, Joni Mitchell, Bob Dylan, Billie Holiday, Marvin Gaye, Aretha Franklin, Smokey Robinson, Neil freaking Young? Don’t we react so strongly to their music because they are just as flawed as we are? Because they, too, can be blinded by love, destroyed by heartbreak, angry as hell, or overcome by nostalgia, and are able to express those feelings so effectively?
It’s emotion, intensity, phrasing, and so many other indefinable elements—not perfect pitch—that make a voice unique, unforgettable, and fundamentally human. As such, we’re moving from thrilling originality to processed homogeneity. Paradoxically, it’s in trying to push the boundaries and break the limitations of the human form, and perhaps to realize its potential fully, that we dehumanize ourselves most.