Category Archives: audition

On Recognizing Conspecifics

Nature Neuroscience1

Communication with members of ones own species is extremely important for social animals. Non-verbal messages can signal socially significant events such as the presence of a predator or the movement of the group. It is therefore no great surprise that some recent research has found monkey brain areas specialized for recognizing conspecifics1. This is a sensible sensory strategy, one which ensures that individuals are able to distinguish between the various growls and caws that they might be privy to, and pluck out the ones most relevant to their continued survival.

In animals where vocalizations transcend the guttural, even more specialization has been unearthed (unbrained?) in the skull. Work on Zebra finches has demonstrated that there are neurons which are active specifically during the production of an individuals song (they have, after all, only one in a lifetime) or while the animal hears his song being played back. Again we can confidently say that this type of anatomical customization is full of utility since it allows the animal to monitor and potentially modulate its learning. In fact it would be difficult to imagine the learning process without this type of helpful structure.

1. Petkov CI, Kayser C, Steudel T, Whittingstall K, Augath M, Logothetis NK. (2008) A voice region in the monkey brain. Nat Neurosci. 11(3):367-74.
2. Prather JF, Peters S, Nowicki S, Mooney R. (2008) Precise auditory-vocal mirroring in neurons for learned vocal communication. Nature. 451(7176):305-10.

On Frequency Tuning

You are able to distinguish auditory frequencies smaller than the bandwidth (a measure of the range of frequencies to which a sensor will respond) of the cells in your inner ear that transduce sound from pressure waves into electrical impulses in your brain. As the authors of a recent paper appearing in Nature report, this is probably achieved through the use of population coding1. However, certain aspects of this phenomenon remain mysterious.

(Figure 1. Cartoon of what a sound sensor’s response might look like, these numbers are not physically realistic)

Suppose we wanted to determine the bandwidth of a sensor having the response properties depicted above. A standard way to do so is the following: one measures the maximum response of the sensor (in this case, 10), and divides that value by two (thus, 5). Then one finds the smallest frequency which will produce that half-max response of 5 (a bit less than 8), and the largest frequency which produces that response (a bit greater than 12). The difference between the larger and smaller frequencies is termedthe bandwidth. Using this method, it’s also called the full-width-at-half-max, for somewhat obvious reasons. We would then describe this sensor as having a central frequency of 10Hz, a Gaussian profile, and a bandwidth of 4Hz.

Now, let me reiterate: you are able to distinguish auditory frequencies smaller than the bandwidth (a measure of the range of frequencies to which a sensor will respond) of the cells that transduce sound from pressure waves into electrical impulses in your brain. This is odd for the following reason, suppose you were relying on the sensor above to tell you about what frequencies (pitches) of sound you were hearing. If I played you a sound at 8Hz and another at 12Hz, the response of the sensor (as you can see by the dotted lines on the figure above) would be identical. That sensor is unable to distinguish between sounds at 8Hz and sounds at 12Hz, yet somehow, your brain can. The way it achieves this feat is through population coding. What this means is that the brain almost always pools the responses of many sensory neurons in creating the conscious representations of sensory data that we experience.

A brief aside, you may be asking the question: Why don’t we just have sensors with different response properties, linear, say? Like the figure below:

That would work nicely since the responses at 8Hz and 12Hz (and any other pair of frequencies for that matter) are distinct. However, it’s very difficult to build biological sensors that have this kind of response profile, and in the interest of steering clear of unwieldy posts, I’ll leave it at that.

Returning to population coding, I’ve said that the brain pools responses, but what does this mean exactly?

Let us now imagine that we examined the responses of two cells, with central frequencies of 9Hz and 11Hz, respectively. At 8Hz, cell 1’s response is ~8.5, and cell 2’s is ~2.5, while at 12Hz, the situation is flipped, with cell 1’s response being 2.5, and cell 2’s being 8.5. This reversal of fortunes is not intentional, not an inherent feature of this system, rather it is the result of my simplified illustration. These two cells are able to achieve in concert what a sole actor cannot: tell the difference between two sounds separated by a difference smaller than their individual bandwidths. All that is needed now is a further cell (in reality another layer of cells) to read off this code. “Whenever cell 1 says ‘8.5’ and cell 2 says ‘2.5,’ I know that sound is being played at 9Hz,” this further cell says.

This simplified view is not so far off from what we think is happening in the transformation of signals from the sensory periphery (your ear) to central processing areas (primary auditory cortex).

And now on to the mysterious facet mentioned earlier. These intrepid explorers of frequency tuning in primary auditory cortex found cells there with vary small bandwidths compared to sensory cells, implying that these cells were performing a computation similar to the one I’ve outlined above, but in order to test this hypothesis, they had to employ a different strategy than the one used for building frequency tuning curves.

In constructing the auditory response profile of a single cell, one generally uses single frequency sounds, pure tones. However, the brain was built to represent the real world, a place where single frequency sounds are essentially never encountered. Thus, the definition of a cell’s response in this manner is necessarily lacking. Though it is possible, there is no reason to expect that one can predict the way a cell will respond to the simultaneous playback of 8Hz & 12Hz based on a simple summation of the individual responses elicited by 8Hz & 12Hz. Further, the heuristic version of population coding that I presented specifically does make that prediction, so recording the responses of these single cells to complex sounds allows these auditory neuroscientists to test their hypothesis concerning the underlying computation and the wiring of the brain.

Before I conclude, I want to mention that this research in particular is of a rare and important type, it is performed on humans. This is not some sort of needless invasion, it is unfortunately necessary to probe the electrical responses of the brains of epilepsy patients in order to remove certain small parts that cause their seizures.

It will probably come as no surprise that the responses predicted by the linear model I’ve discussed were quite distinct from those that the researchers found. This is exciting because it means that the brain has yet again provided a puzzle for us to solve. We know what the brain must be doing, but how, is the question presented. . Exploration of such quandaries can yield results that expand our general knowledge, be applied to other fields, and give us insight into the very nature of how we function. Such is the beauty of neuroscience.


1. Y. Bitterman, R. Mukamel, R. Malach, I. Fried & I. Nelken (2008) Ultra-fine frequency tuning revealed in single neurons of human auditory cortex. Nature 451, 197-201 | doi:10.1038/nature06476


I was riding the NYC subway listening to my iPod the other day when it ran out of batteries (hard to relate to such an experience I know). I was a bit vexed because I had Massive Attack’s “Lately” stuck in my head and really wanted to scratch that itch. I realized that by focusing on the song, I was able to produce a damned good internal manifestation of it. I instantly tried specifically to do the same with a visual image, Max Ernst’s work (The Elephant Celebes, 1921) , but I could only remember object positions and placements; if I focused I could recall the pleasing quality of soft swaths of dark gray with silvery white punctuations that make up the central figure of the canvas, but never a detailed, full image. Perhaps some people can summon perfect pictures of a loved one’s face, but personally I’ve never been able to do that; only by relying on some other form of memory like a happy event am I able to better recall faces. I am explicitly, however, trying to avoid such considerations because this is one of the classic problems in confronting human memory, it’s capacity and quality are completely contingent on context. Despite all that is known and available to read on this subject, my inward exam led me to think about memories of unimodal (one sense at a time) sensory experiences in general.

I am really treading on thin philosophical and scientific ice by using introspection as my main mode of exploration, but this is meant to be neither of those things, merely thought provoking. Because this is such personal territory, it’s obvious that there will be some variation from person to person, for instance, in the extreme, a man blind from birth will find it decidedly impossible to recall any visual image, and can probably recall audio better than any person with sight. This person to person variation may have something to do with inherent differences in brain structure, including those completely lacking sensory apparati. So before I do a little run down of the various sensory systems, allow me a digression, starting from auditory stimuli, about brains that may facilitate the discussion to follow.

Music isn’t a very general example of an auditory stimulus, and this may have something to do with the fidelity of the remembered experience. There are a few factors which immediately come to mind that might be relevant: (1) the amount of cortex devoted to representing the type of stimulus in question, and (2) the involvement of mirror neurons, (3) the temporal quality of music. The Cerebral Cortex as it is “properly” referred to is the outermost few millimeters of the brain of higher organisms. The wrinkled quality that a brain has (if you’ve ever seen an image of one) is thought to be a way to increase the amount of cortex. This is where the brain does its most complex information processing. It is here that one can find single neurons (brain cells) which respond* to the various senses in such complex ways that single cells will react best when you are looking at a picture of Bill Clinton versus, say, a car or your grandmother. Mirror Neurons are wonderful little devices in your head which respond when you perform an action and when you observe another individual performing the same action. For example if you reach out and pick up an apple, the same mirror neuron will fire no matter how you do it. If you use a set of tongs for instance or daintily pick it up by the stem, and the same is true of the observed act, so that it seems mirror neurons encode intention of action. They’re very important for social interaction and learning and a host of other things, and they probably deserve their own post, but for now they are at the service of my argument about music and paintings.

The amount of cortex devoted to vision far outweighs any other sensory modality, certainly the visual cortex is larger than the primary auditory cortex. So it may simply be the case that it is more difficult for memories to light up all of the various parts of the visual cortex that are necessary to generate a truly accurate experience of sight.

When you hear somebody speak, mirror neurons potentiate, that is to say they make ready to use and facilitate the use of, the parts of your brain used for vocalising. This even goes so far as to provoke measurable electrical responses in the muscles of ones throat. When you watch somebody prick themselves it is thought that the mirror neuron system contributes to any sensation of pain or touch that you might experience as a result. It may thus be that when one is listening to music with singing, the mirror neuron system strengthens the auditory cortex’s memory based activity.

As to the temporal quality of music, this just doesn’t seem that relevant. I’m no more likely to be able to remember a series of images (unless it’s the final frames of Trufautt’s “The 400 Blows”) than I am to remember a single image.

Now, let’s see if these two ideas tell us anything when we try to examine other senses. Let’s consider the following 8 senses (What happened to five you ask? Well we need all these categories because the last three don’t really fit into the first five.). They are organized roughly by the amount of cortex devoted to them.

  1. Vision
  2. Somatosensation (touch)
  3. Audition
  4. Proprioception (muscle movement, posture)
  5. Gustation (taste)
  6. Olfaction (smell)
  7. Vestibular (balance, orientation)
  8. Interoception (hunger, thirst, drowsiness, air hunger, etc.)

This seems to immediately invalidate the suggestion that the amount of cortex devoted to a modality is what’s relevant. I have a very difficult if not impossible time remembering the experience of eating duck at WD-50, and yet the gustatory and olfactory cortical areas combined are smaller than the primary auditory cortex. As to the involvement of mirror neurons, it is incredibly difficult to asses. This is because one can’t really activate the mirror neuron system except by the use of vision or audition, so its potential utility in enhancing other unimodal memories is essentially nil. It might, however, facilitate the memory of a great LeBron James dunk or a beautiful Alvin Ailey dance piece. Despite this difficulty, it still seems to me that there is something extremely special about music. If I try to remember a series of isolated noises that I’ve heard it doesn’t even really make sense. I can think of specific sounds and noses: my fan blowing over my body on a hot summer night, a newer subway car’s increasing frequency whine as it picks up speed out of the station, a fluorescent bulb’s hum in the lab where I work. The problem with all of these is that I am unable to call these up without the associated visual experience as well, and then we’re back to the context/multi-modal issue. We must consider the possibility that our ability to both hear and make sounds facilitates a mirror neuron based enhancement of all music, vocal or otherwise; many instruments produce sounds well within the range of frequencies that we produce even if we can’t match their spectral qualities. I would really like to know if anybody out there feels that they have some sort of different experience of memory to the general one I’ve described here, or if they have theories of why we might perceive things in this way.

* I know respond is a weighted word, but I’ve got to cut this increasingly reductionist explanation off somewhere, if you’d like an explanation of what I mean by “respond” please email me and I’ll be happy to oblige.