Can a mouse learn a new song?
Such a question might seem whimsical. Though humans have lived alongside mice for at least 15,000 years, few of us have ever heard mice sing, because they do so in frequencies beyond the range detectable by human hearing. As pups, their high-pitched songs alert their mothers to their whereabouts; as adults, they sing in ultrasound to woo one another. For decades, researchers considered mouse songs instinctual, the fixed tunes of a windup music box, rather than the mutable expressions of individual minds.
But no one had tested whether that was really true. In 2012, a team of neurobiologists at Duke University, led by Erich Jarvis, a neuroscientist who studies vocal learning, designed an experiment to find out. The team surgically deafened five mice and recorded their songs in a mouse-size sound studio, tricked out with infrared cameras and microphones. They then compared sonograms of the songs of deafened mice with those of hearing mice. If the mouse songs were innate, as long presumed, the surgical alteration would make no difference at all.
Jarvis and his researchers slowed down the tempo and shifted the pitch of the recordings, so that they could hear the songs with their own ears. Those of the intact mice sounded “remarkably similar to some bird songs,” Jarvis wrote in a 2013 paper that described the experiment, with whistlelike syllables similar to those in the songs of canaries and the trills of dolphins. Not so the songs of the deafened mice: Deprived of auditory feedback, their songs became degraded, rendering them nearly unrecognizable. They sounded, the scientists noted, like “squawks and screams.” Not only did the tunes of a mouse depend on its ability to hear itself and others, but also, as the team found in another experiment, a male mouse could alter the pitch of its song to compete with other male mice for female attention.
Inside these murine skills lay clues to a puzzle many have called “the hardest problem in science”: the origins of language. In humans, “vocal learning” is understood as a skill critical to spoken language. Researchers had already discovered the capacity for vocal learning in species other than humans, including in songbirds, hummingbirds, parrots, cetaceans such as dolphins and whales, pinnipeds such as seals, elephants and bats. But given the centuries-old idea that a deep chasm separated human language from animal communications, most scientists understood the vocal learning abilities of other species as unrelated to our own — as evolutionarily divergent as the wing of a bat is to that of a bee. The apparent absence of intermediate forms of language — say, a talking animal — left the question of how language evolved resistant to empirical inquiry.
When the Duke researchers dissected the brains of the hearing and deafened mice, they found a rudimentary version of the neural circuitry that allows the forebrains of vocal learners such as humans and songbirds to directly control their vocal organs. Mice don’t seem to have the vocal flexibility of elephants; they cannot, like the 10-year-old female African elephant in Tsavo, Kenya, mimic the sound of trucks on the nearby Nairobi-Mombasa highway. Or the gift for mimicry of seals; an orphaned harbor seal at the New England Aquarium could utter English phrases in a perfect Maine accent (“Hoover, get over here,” he said. “Come on, come on!”).
But the rudimentary skills of mice suggested that the language-critical capacity might exist on a continuum, much like a submerged land bridge might indicate that two now-isolated continents were once connected. In recent years, an array of findings have also revealed an expansive nonhuman soundscape, including: turtles that produce and respond to sounds to coordinate the timing of their birth from inside their eggs; coral larvae that can hear the sounds of healthy reefs; and plants that can detect the sound of running water and the munching of insect predators. Researchers have found intention and meaning in this cacophony, such as the purposeful use of different sounds to convey information. They’ve theorized that one of the most confounding aspects of language, its rules-based internal structure, emerged from social drives common across a range of species.
With each discovery, the cognitive and moral divide between humanity and the rest of the animal world has eroded. For centuries, the linguistic utterances of Homo sapiens have been positioned as unique in nature, justifying our dominion over other species and shrouding the evolution of language in mystery. Now, experts in linguistics, biology and cognitive science suspect that components of language might be shared across species, illuminating the inner lives of animals in ways that could help stitch language into their evolutionary history — and our own.
For hundreds of years, language marked “the true difference between man and beast,” as the philosopher René Descartes wrote in 1649. As recently as the end of the last century, archaeologists and anthropologists speculated that 40,000 to 50,000 years ago a “human revolution” fractured evolutionary history, creating an unbridgeable gap separating humanity’s cognitive and linguistic abilities from those of the rest of the animal world.
Linguists and other experts reinforced this idea. In 1959, the M.I.T. linguist Noam Chomsky, then 30, wrote a blistering 33-page takedown of a book by the celebrated behaviorist B.F. Skinner, which argued that language was just a form of “verbal behavior,” as Skinner titled the book, accessible to any species given sufficient conditioning. One observer called it “perhaps the most devastating review ever written.” Between 1972 and 1990, there were more citations of Chomsky’s critique than Skinner’s book, which bombed.
The view of language as a uniquely human superpower, one that enabled Homo sapiens to write epic poetry and send astronauts to the moon, presumed some uniquely human biology to match. But attempts to find those special biological mechanisms — whether physiological, neurological, genetic — that make language possible have all come up short.
One high-profile example came in 2001, when a team led by the geneticists Cecilia Lai and Simon Fisher discovered a gene — called FoxP2 — in a London family riddled with childhood apraxia of speech, a disorder that impairs the ability of otherwise cognitively capable individuals to coordinate their muscles to produce sounds, syllables and words in an intelligible sequence. Commentators hailed FoxP2 as the long sought-after gene that enabled humans to talk — until the gene turned up in the genomes of rodents, birds, reptiles, fish and ancient hominins such as Neanderthals, whose version of FoxP2 is much like ours. (Fisher so often encountered the public expectation that FoxP2 was the “language gene” that he resolved to acquire a T-shirt that read, “It’s more complicated than that.”)
The search for an exclusively human vocal anatomy has failed, too. For a 2001 study, the cognitive scientist Tecumseh Fitch cajoled goats, dogs, deer and other species to vocalize while inside a cineradiograph machine that filmed the way their larynxes moved under X-ray. Fitch discovered that species with larynxes different from ours — ours is “descended” and located in our throats rather than our mouths — could nevertheless move them in similar ways. One of them, the red deer, even had the same descended larynx we do.
Fitch and his then-colleague at Harvard, the evolutionary biologist Marc Hauser, began to wonder if they’d been thinking about language all wrong. Linguists described language as a singular skill, like being able to swim or bake a soufflé: You either had it or you didn’t. But perhaps language was more like a multicomponent system that included psychological traits, such as the ability to share intentions; physiological ones, such as motor control over vocalizations and gestures; and cognitive capacities, such as the ability to combine signals according to rules, many of which might appear in other animals as well.
Fitch, whom I spoke to by Zoom in his office at the University of Vienna, drafted a paper with Hauser as a “kind of an argument against Chomsky,” he told me. As a courtesy, he sent the M.I.T. linguist a draft. One evening, he and Hauser were sitting in their respective offices along the same hall at Harvard when an email from Chomsky dinged their inboxes. “We both read it and we walked out of our rooms going, ‘What?’” Chomsky indicated that not only did he agree, but that he’d be willing to sign on to their next paper on the subject as a co-author. That paper, which has since racked up more than 7,000 citations, appeared in the journal Science in 2002.
Squabbles continued over which components of language were shared with other species and which, if any, were exclusive to humans. Those included, among others, language’s intentionality, its system of combining signals, its ability to refer to external concepts and things separated by time and space and its power to generate an infinite number of expressions from a finite number of signals. But reflexive belief in language as an evolutionary anomaly started to dissolve. “For the biologists,” recalled Fitch, “it was like, ‘Oh, good, finally the linguists are being reasonable.’”
Evidence of continuities between animal communication and human language continued to mount. The sequencing of the Neanderthal genome in 2010 suggested that we hadn’t significantly diverged from that lineage, as the theory of a “human revolution” posited. On the contrary, Neanderthal genes and those of other ancient hominins persisted in the modern human genome, evidence of how intimately we were entangled. In 2014, Jarvis found that the neural circuits that allowed songbirds to learn and produce novel sounds matched those in humans, and that the genes that regulated those circuits evolved in similar ways. The accumulating evidence left “little room for doubt,” Cedric Boeckx, a theoretical linguist at the University of Barcelona, noted in the journal Frontiers in Neuroscience. “There was no ‘great leap forward.’”
As our understanding of the nature and origin of language shifted, a host of fruitful cross-disciplinary collaborations arose. Colleagues of Chomsky’s, such as the M.I.T. linguist Shigeru Miyagawa, whose early career was shaped by the precept that “we’re smart, they’re not,” applied for grants with primatologists and neuroscientists to study how human language might be related to birdsong and primate calls. Interdisciplinary centers sprang up devoted specifically to the evolution of language, including at the University of Zurich and the University of Edinburgh. Lectures at a biannual conference on language evolution once dominated by “armchair theorizing,” as the cognitive scientist and founder of the University of Edinburgh’s Centre for Language Evolution, Simon Kirby, put it, morphed into presentations “completely packed with empirical data.”
Credit…Illustration by Denise Nestor
One of the thorniest problems researchers sought to address was the link between thought and language. Philosophers and linguists long held that language must have evolved not for the purpose of communication but to facilitate abstract thought. The grammatical rules that structure language, a feature of languages from Algonquin to American Sign Language, are more complex than necessary for communication. Language, the argument went, must have evolved to help us think, in much the same way that mathematical notations allow us to make complex calculations.
Ev Fedorenko, a cognitive neuroscientist at M.I.T., thought this was “a cool idea,” so, about a decade ago, she set out to test it. If language is the medium of thought, she reasoned, then thinking a thought and absorbing the meaning of spoken or written words should activate the same neural circuits in the brain, like two streams fed by the same underground spring. Earlier brain-imaging studies showed that patients with severe aphasia could still solve mathematical problems, despite their difficulty in deciphering or producing language, but failed to pinpoint distinctions between brain regions dedicated to thought and those dedicated to language. Fedorenko suspected that might be because the precise location of these regions varied from individual to individual. In a 2011 study, she asked healthy subjects to make computations and decipher snatches of spoken and written language while she watched how blood flowed to aroused parts of their brains using an M.R.I. machine, taking their unique neural circuitry into account in her subsequent analysis. Her fM.R.I. studies showed that thinking thoughts and decoding words mobilized distinct brain pathways. Language and thought, Fedorenko says, “really are separate in an adult human brain.”
At the University of Edinburgh, Kirby hit upon a process that might explain how language’s internal structure evolved. That structure, in which simple elements such as sounds and words are arranged into phrases and nested hierarchically within one another, gives language the power to generate an infinite number of meanings; it is a key feature of language as well as of mathematics and music. But its origins were hazy. Because children intuit the rules that govern linguistic structure with little if any explicit instruction, philosophers and linguists argued that it must be a product of some uniquely human cognitive process. But researchers who scrutinized the fossil record to determine when and how that process evolved were stumped: The first sentences uttered left no trace behind.
Kirby designed an experiment to simulate the evolution of language inside his lab. First, he developed made-up codes to serve as proxies for the disordered collections of words widely believed to have preceded the emergence of structured language, such as random sequences of colored lights or a series of pantomimes. Then he recruited subjects to use the code under a variety of conditions and studied how the code changed. He asked subjects to use the code to solve communication tasks, for example, or to pass the code on to one another as in a game of telephone. He ran the experiment hundreds of times using different parameters on a variety of subjects, including on a colony of baboons living in a seminaturalistic enclosure equipped with a bank of computers on which they could choose to play his experimental games.
What he found was striking: Regardless of the native tongue of the subjects, or whether they were baboons, college students or robots, the results were the same. When individuals passed the code on to one another, the code became simpler but also less precise. But when they passed it on to one another and also used it to communicate, the code developed a distinct architecture. Random sequences of colored lights turned into richly patterned ones; convoluted, pantomimic gestures for words such as “church” or “police officer” became abstract, efficient signs. “We just saw, spontaneously emerging out of this experiment, the language structures we were waiting for,” Kirby says. His findings suggest that language’s mystical power — its ability to turn the noise of random signals into intelligible formulations — may have emerged from a humble trade-off: between simplicity, for ease of learning, and what Kirby called “expressiveness,” for unambiguous communication.
For Descartes, the equation of language with thought meant animals had no mental life at all: “The brutes,” he opined, “don’t have any thought.” Breaking the link between language and human biology didn’t just demystify language; it restored the possibility of mind to the animal world and repositioned linguistic capacities as theoretically accessible to any social species.
The search for the components of language in nonhuman animals now extends to the far reaches of our phylogenetic tree, encompassing creatures that may communicate in radically unfamiliar ways.
This summer, I met with Marcelo Magnasco, a biophysicist, and Diana Reiss, a psychologist at Hunter College who studies dolphin cognition, in Magnasco’s lab at Rockefeller University. Overlooking the East River, it was a warmly lit room, with rows of burbling tanks inhabited by octopuses, whose mysterious signals they hoped to decode. Magnasco became curious about the cognitive and communicative abilities of cephalopods while diving recreationally, he told me. Numerous times, he said, he encountered cephalopods and had “the overpowering impression that they were trying to communicate with me.” During the Covid-19 shutdown, when his work studying dolphin communication with Reiss was derailed, Magnasco found himself driving to a Petco in Staten Island to buy tanks for octopuses to live in his lab.
During my visit, the grayish pink tentacles of the octopus clinging to the side of the glass wall of her tank started to flash bright white. Was she angry? Was she trying to tell us something? Was she even aware of our presence? There was no way to know, Magnasco said. Earlier efforts to find linguistic capacities in other species failed, in part, he explained, because we assumed they would look like our own. But the communication systems of other species might, in fact, be “truly exotic to us,” Magnasco said. A species that can recognize objects by echolocation, as cetaceans and bats can, might communicate using acoustic pictographs, for example, which might sound to us like meaningless chirps or clicks. To disambiguate the meaning of animal signals, such as a string of dolphin clicks or whalesong, scientists needed some inkling of where meaning-encoding units began and ended, Reiss explained. “We, in fact, have no idea what the smallest unit is,” she said. If scientists analyze animal calls using the wrong segmentation, meaningful expressions turn into meaningless drivel: “ad ogra naway” instead of “a dog ran away.”
An international initiative called Project CETI, founded by David Gruber, a biologist at the City University of New York, hopes to get around this problem by feeding recordings of sperm-whale clicks, known as codas, into computer models, which might be able to discern patterns in them, in the same way that ChatGPT was able to grasp vocabulary and grammar in human language by analyzing publicly available text. Another method, Reiss says, is to provide animal subjects with artificial codes and observe how they use them.
Reiss’s research on dolphin cognition is one of a handful of projects on animal communication that dates back to the 1980s, when there were widespread funding cuts in the field, after a top researcher retracted his much-hyped claim that a chimpanzee could be trained to use sign language to converse with humans. In a study published in 1993, Reiss offered bottlenose dolphins at a facility in Northern California an underwater keypad that allowed them to choose specific toys, which it delivered while emitting computer-generated whistles, like a kind of vending machine. The dolphins spontaneously began mimicking the computer-generated whistles when they played independently with the corresponding toy, like kids tossing a ball and naming it “ball, ball, ball,” Reiss told me. “The behavior,” Reiss said, “was strikingly similar to the early stages of language acquisition in children.”
The researchers hoped to replicate the method by outfitting an octopus tank with an interactive platform of some kind and observing how the octopus engaged with it. But it was unclear whether such a device might interest the lone cephalopod. An earlier episode of displeasure led her to discharge enough ink to turn her tank water so black that she couldn’t be seen. Unlocking her communicative abilities might require that she consider the scientists as fascinating as they did her.
While experimenting with animals trapped in cages and tanks can reveal their latent faculties, figuring out the range of what animals are communicating to one another requires spying on them in the wild. Past studies often conflated general communication, in which individuals extract meaning from signals sent by other individuals, with language’s more specific, flexible and open-ended system. In a seminal 1980 study, for example, the primatologists Robert Seyfarth and Dorothy Cheney used the “playback” technique to decode the meaning of alarm calls issued by vervet monkeys at Amboseli National Park in Kenya. When a recording of the barklike calls emitted by a vervet encountering a leopard was played back to other vervets, it sent them scampering into the trees. Recordings of the low grunts of a vervet who spotted an eagle led other vervets to look up into the sky; recordings of the high-pitched chutters emitted by a vervet upon noticing a python caused them to scan the ground.
At the time, The New York Times ran a front-page story heralding the discovery of a “rudimentary ‘language’” in vervet monkeys. But critics objected that the calls might not have any properties of language at all. Instead of being intentional messages to communicate meaning to others, the calls might be involuntary, emotion-driven sounds, like the cry of a hungry baby. Such involuntary expressions can transmit rich information to listeners, but unlike words and sentences, they don’t allow for discussion of things separated by time and space. The barks of a vervet in the throes of leopard-induced terror could alert other vervets to the presence of a leopard — but couldn’t provide any way to talk about, say, “the really smelly leopard who showed up at the ravine yesterday morning.”
Toshitaka Suzuki, an ethologist at the University of Tokyo who describes himself as an animal linguist, struck upon a method to disambiguate intentional calls from involuntary ones while soaking in a bath one day. When we spoke over Zoom, he showed me an image of a fluffy cloud. “If you hear the word ‘dog,’ you might see a dog,” he pointed out, as I gazed at the white mass. “If you hear the word ‘cat,’ you might see a cat.” That, he said, marks the difference between a word and a sound. “Words influence how we see objects,” he said. “Sounds do not.” Using playback studies, Suzuki determined that Japanese tits, songbirds that live in East Asian forests and that he has studied for more than 15 years, emit a special vocalization when they encounter snakes. When other Japanese tits heard a recording of the vocalization, which Suzuki dubbed the “jar jar” call, they searched the ground, as if looking for a snake. To determine whether “jar jar” meant “snake” in Japanese tit, he added another element to his experiments: an eight-inch stick, which he dragged along the surface of a tree using hidden strings. Usually, Suzuki found, the birds ignored the stick. It was, by his analogy, a passing cloud. But then he played a recording of the “jar jar” call. In that case, the stick seemed to take on new significance: The birds approached the stick, as if examining whether it was, in fact, a snake. Like a word, the “jar jar” call had changed their perception.
Cat Hobaiter, a primatologist at the University of St. Andrews who works with great apes, developed a similarly nuanced method. Because great apes appear to have a relatively limited repertoire of vocalizations, Hobaiter studies their gestures. For years, she and her collaborators have followed chimps in the Budongo forest and gorillas in Bwindi in Uganda, recording their gestures and how others respond to them. “Basically, my job is to get up in the morning to get the chimps when they’re coming down out of the tree, or the gorillas when they’re coming out of the nest, and just to spend the day with them,” she told me. So far, she says, she has recorded about 15,600 instances of gestured exchanges between apes.
To determine whether the gestures are involuntary or intentional, she uses a method adapted from research on human babies. Hobaiter looks for signals that evoke what she calls an “Apparently Satisfactory Outcome.” The method draws on the theory that involuntary signals continue even after listeners have understood their meaning, while intentional ones stop once the signaler realizes her listener has comprehended the signal. It’s the difference between the continued wailing of a hungry baby after her parents have gone to fetch a bottle, Hobaiter explains, and my entreaties to you to pour me some coffee, which cease once you start reaching for the coffeepot. To search for a pattern, she says she and her researchers have looked “across hundreds of cases and dozens of gestures and different individuals using the same gesture across different days.” So far, her team’s analysis of 15 years’ worth of video-recorded exchanges has pinpointed dozens of ape gestures that trigger “apparently satisfactory outcomes.”
These gestures may also be legible to us, albeit beneath our conscious awareness. Hobaiter applied her technique on pre-verbal 1- and 2-year-old children, following them around recording their gestures and how they affected attentive others, “like they’re tiny apes, which they basically are,” she says. She also posted short video clips of ape gestures online and asked adult visitors who’d never spent any time with great apes to guess what they thought they meant. She found that pre-verbal human children use at least 40 or 50 gestures from the ape repertoire, and adults correctly guessed the meaning of video-recorded ape gestures at a rate “significantly higher than expected by chance,” as Hobaiter and Kirsty E. Graham, a postdoctoral research fellow in Hobaiter’s lab, reported in a 2023 paper for PLOS Biology.
The emerging research might seem to suggest that there’s nothing very special about human language. Other species use intentional wordlike signals just as we do. Some, such as Japanese tits and pied babblers, have been known to combine different signals to make new meanings. Many species are social and practice cultural transmission, satisfying what might be prerequisite for a structured communication system like language. And yet a stubborn fact remains. The species that use features of language in their communications have few obvious geographical or phylogenetic similarities. And despite years of searching, no one has discovered a communication system with all the properties of language in any species other than our own.
For some scientists, the mounting evidence of cognitive and linguistic continuities between humans and animals outweighs evidence of any gaps. “There really isn’t such a sharp distinction,” Jarvis, now at Rockefeller University, said in a podcast. Fedorenko agrees. The idea of a chasm separating man from beast is a product of “language elitism,” she says, as well as a myopic focus on “how different language is from everything else.”
But for others, the absence of clear evidence of all the components of language in other species is, in fact, evidence of their absence. In a 2016 book on language evolution titled “Why Only Us,” written with the linguist Robert C. Berwick, Chomsky describes animal communications as “radically different” from human language. Seyfarth and Cheney, in a 2018 book, note the “striking discontinuities” between human and nonhuman loquacity. Animal calls may be modifiable; they may be voluntary and intentional. But they’re rarely combined according to rules in the way that human words are and “appear to convey only limited information,” they write. If animals had anything like the full suite of linguistic components we do, Kirby says, we would know by now. Animals with similar cognitive and social capacities to ours rarely express themselves systematically the way we do, with systemwide cues to distinguish different categories of meaning. “We just don’t see that kind of level of systematicity in the communication systems of other species,” Kirby said in a 2021 talk.
This evolutionary anomaly may seem strange if you consider language an unalloyed benefit. But what if it isn’t? Even the most wondrous abilities can have drawbacks. According to the popular “self-domestication” hypothesis of language’s origins, proposed by Kirby and James Thomas in a 2018 paper published in Biology & Philosophy, variable tones and inventive locutions might prevent members of a species from recognizing others of their kind. Or, as others have pointed out, they might draw the attention of predators. Such perils could help explain why domesticated species such as Bengalese finches have more complex and syntactically rich songs than their wild kin, the white-rumped munia, as discovered by the biopsychologist Kazuo Okanoya in 2012; why tamed foxes and domesticated canines exhibit heightened abilities to communicate, at least with humans, compared with wolves and wild foxes; and why humans, described by some experts as a domesticated species of their ape and hominin ancestors, might be the most talkative of all. A lingering gap between our abilities and those of other species, in other words, does not necessarily leave language stranded outside evolution. Perhaps, Fitch says, language is unique to Homo sapiens, but not in any unique way: special to humans in the same way the trunk is to the elephant and echolocation is to the bat.
The quest for language’s origins has yet to deliver King Solomon’s seal, a ring that magically bestows upon its wearer the power to speak to animals, or the future imagined in a short story by Ursula K. Le Guin, in which therolinguists pore over the manuscripts of ants, the “kinetic sea writings” of penguins and the “delicate, transient lyrics of the lichen.” Perhaps it never will. But what we know so far tethers us to our animal kin regardless. No longer marooned among mindless objects, we have emerged into a remade world, abuzz with the conversations of fellow thinking beings, however inscrutable.
Sonia Shah is a science journalist and the author, most recently, of “The Next Great Migration: The Beauty and Terror of Life on the Move.” She is currently writing a book on the history and science of human exceptionalism. Denise Nestor is an artist and illustrator in Dublin. She is known for her finely detailed hand-drawn art, often inspired by nature.