Honors Thesis Preliminary Draft

Assessing Superiority of Studio Headphones

 

Israel Edery

Advisor: Prof. Natalie Kacinik, Ph.D.

PSYC 5002 Independent Research

Spring 2023

 

Department of Psychology

Brooklyn College of The City University of New York

 

Honors Thesis Preliminary Draft

 

Authors Note

I would like to give a special thanks to Dr. Kacinik for her time,

flexibility, and mentorship throughout this project.

 

Abstract 

Our project investigates whether average music listeners can detect differences in sound quality. Specifically, can non-musician music consumers blindly detect differences between high and low-quality studio headphones? For this experiment, participants visited our lab on two occasions to listen to music. Without the participant’s awareness, they were given low-quality headphones during one lab visit and significantly higher-quality headphones for the other. During each music listening session at our lab, participants filled out surveys evaluating their emotional reaction and degree of enjoyment of two musical compositions. We hypothesized that participants would rate more positively their experience using the higher-quality studio headphones. Data is currently being collected and analyzed. Results will be published in the coming months.

Introduction

In exploring underlying emotions that contribute to positive experiences of art, Eskine, Kacinik, & Prinz (2012) found that individuals in a state of fear rated visual art more positively than those in a state of happiness or physical arousal. This was established through an experiment with five conditions, each representative of a different kind/level of arousal. Eskine, et al. (2012) had one group of subjects watch a scary video clip, another group watch a happy video clip, the third and fourth do either 15 or 30 jumping jacks, and the final control group was asked to sit normally before viewing the pieces of art. Following these conditions, the subjects were asked to rate several pieces of abstract visual art. The results showed significantly higher positive judgments of the artwork from those in the fear-induced condition. Extending the Eskine, et al. (2012) findings regarding contributing variables to positive judgments in visual art, this study aims to examine a potentially important variable in the experience of music-listening.

As in visual art, varying settings or mediums/types of devices used when listening to music can impact how individuals rate a song or melody they just heard. Today, people listen to music through mediums ranging from simple wired earphones to complex music systems developed for automotive audio or professional recording-studio close listening. Likewise, settings of music exposure range from subtle supermarket background noise to blasting music at a concert or music festival. 

One’s degree of enjoyment or emotional reaction(s) to a musical composition is heavily dependent on factors like listening medium and setting. An example demonstrating the influence that a setting can have was a study by Roose, & Stichele (2010) where they established that people are open to different types of music depending on where the music is being listened to (i.e., private vs. public space). They conducted a large-scale survey of the Flemish population in Belgium to investigate music consumption habits by contrasting the types of music listened to at home versus the types of concerts attended by participants. The results showed that at home, participants were open to listening to a wide variety of music. Conversely, the participant’s openness to concerts with differing genres of music (especially classical, “high-brow” concerts) was much more limited. Being substantially more “omnivorous” with music consumption in the privacy of one’s home suggests that social context plays a role in how and where music is consumed (Roose & Stichele, 2010).

Similarly, Egermann, Sutherland, Grewe, Nagel, Kopiez, & Altenmüller (2011) measured the psychological and physiological emotional reactions of a group of musicians to 10 pieces of music to determine if they would react differently when they were listening to the music alone. Their findings showed a higher degree of skin conductance and chills when the participants listened to the music alone – without the other group members. These findings further highlight the significant influence that an environmental setting can have on someone’s music listening experience.

There is currently a wide array of options for listening to music, including various speakers and sound systems from companies like JBL, Bose, and Sony. For personal use in particular, on-ear and over-ear headphones continue to be an increasingly popular medium for music listening. In a study by the Audio Engineering Society regarding headphone sound-quality preferences, Olive, Welti, & McMullin (2014) note that headphone sales had reached $8.2 billion by the year 2013. More recently, Statista Market Insights reported that revenue in the headphones market amounts to $17.56 billion US dollars in 2023, with the market expected to grow annually by 2.42%. 

Despite the substantial growth of the headphone industry in recent years, along with future potential, there currently exists a limited amount of academic research addressing specific questions regarding consumer headphone preferences (Olive et al., 2014). Furthermore, to the extent we’ve searched, there don’t appear to be any replication or related studies to the few relevant publications on headphone preferences that have been published prior to the year 2014. Consequently, the basic empirical question of whether higher-quality studio headphones are worth the investment of the average consumer remains unanswered and the primary purpose of this study.

General questions regarding sound-quality preferences in music listening have been investigated in the past. As early as 1956, Kirk investigated and published findings regarding differing loudspeaker bandwidth preferences, and the effects of prior exposure to higher-quality audio on music listening preferences. Specifically, Kirk (1956) featured one experimental group that listened to music reproduced under a more limited 30-15000cps frequency range, while the other group listened to music under a significantly wider 180-3000cps frequency range. Cycles per second – abbreviated as “cps” is a sound frequency metric now known as Hz. Kirk (1956) reported that the average college student preferred listening to music through speakers with a lower bandwidth and level of accuracy. However, following six and a half weeks of exposure to the wider bandwidth more accurate speakers, the students ended up preferring the better speakers. Based on these findings Kirk (1956) concluded that exposure and learning play a critical role in sound quality preferences. Multiple additional studies have since utilized different loudspeaker systems to establish a bandwidth and sound frequency calibration consensus among trained musicians, sound technicians and engineers, and untrained listeners (Olive, 2003; Rumsey, Zielinski, Kassier, & Bech, 2005). The findings in these publications are consistent with Kirk (1956)’s general theory that learning and training play a role in one’s ability to develop an appreciation for high-quality sound.

With the advent of MP3 players, which deliver significantly lower sound quality than CDs because of how MP3 music files are compressed and stored, Pras, Zimmerman, Levitin, & Guastavino (2009) conducted an experiment establishing that trained sound engineers and musicians could easily detect the difference in sound quality between MP3 (96-320 kbit/s) and CD (44.1 KHz, 16 bit) formats. The music in Pras et al. (2009) was delivered to the participants through professional loudspeakers in a controlled listening room designed for close listening. The quality level of the sound (MP3 vs. CD, etc.) was manipulated digitally. The listening preferences for CD quality versus MP3 file formats at different bit rates (96, 128, 192, 256, and 320 kbit/s) were assessed using both musicians and trained sound engineers. Interestingly, Pras et al. (2009) found that the trained professional musicians significantly preferred CD-quality music to MP3 versions, up to bit rates of 192 kbit/s. At higher bit rates, professional listeners had difficulty discriminating between the MP3 and CD versions. The trained sound engineers were better at discriminating than the musicians. Regardless, Pras et al. (2009) established the professional musician’s ability to appreciate higher-quality audio.

As a preface to information in upcoming paragraphs, the term “accurately calibrated” refers to a standard established by professional musicians and sound engineers. It signifies sound that is properly balanced: not highlighting the low (i.e., bass) or high (treble) frequencies. Likewise, this standard includes ensuring that the music is “anechoic,” meaning that there are no reverberating echoes involved in the recording or sound production. This generally means that the music sounds relatively flat yet precisely representative of the original sound properly recorded from the instrument(s) used in that recording.

Since the Pras et al. (2009) study included only musically trained professionals and did not include any untrained listeners, Olive (2012) performed a follow-up study that included a younger untrained high-school and college population. In this case, there were two items being assessed. First, the student’s ability to discriminate between MP3 (128 kb/s) vs. CD-quality (44.1 kHz. 16-bit) formats. And second, their ability to discriminate between less and more accurately calibrated loudspeakers. The results from both listening tests indicated that untrained participants were also able to discern and appreciate a better quality of reproduced sound when given the opportunity to directly compare it to lower-quality options (Olive, 2012). While Olive (2012) provides compelling evidence that untrained listeners can detect differences in sound quality, this evidence was only established using professional loudspeakers. Since headphones were not used in any of the above experiments, the application of these studies for knowing if people can discern between lower and higher-quality headphones remains to be determined.

Outside of recent research published by the Audio Engineering Society, academic consensus on the significance of sound quality and the potential effects of listening to music via high-quality studio headphones appears split. Studies like Grewe, Nagel, Kopiez, & Altenmüüller (2007) exploring physical “chill “or “goose bumps” reactions to music used high-quality headphones and took significant measures to ensure high sound quality. These measures included the use of the - highly regarded by musicians - Beyerdynamic DT 770 Pro headphones, in combination with a USB sound card which ensures that the sound is transmitted properly to the headphones. Grewe et al. (2007) were interested in acoustical elements, vocals, and volume effects in emotional reactions in music. This may explain why they took extra measures and indeed yielded significant findings, especially with regard to the effects of musical patterns, vocals, and volume.

Conversely, other studies related to musical preference have entirely omitted the significance of the sound medium altogether. For instance, a study by Greenberg, Kosinski, Stillwell, Monteiro, Levitin, & Rentfrow (2016) focused on assessing potential relationships between personality traits and song preference. While this study took into account age and hearing deficits, citing similar implementations in prior research by Bonneville-Roussy, Rentfrow, Xu, & Potter (2013), it did not consider the potential impacts of the quality of the headphones that were used by its participants. 

Given the lack of consideration by Greenberg et al. (2016) to the kind of medium used by their participants, the quality of the sound medium used by each participant was likely very varied, especially given their large sample size (N = 9,454). With some participants potentially using high-quality headphones and others cheap earphones or laptop computer speakers, it is highly likely that the data collected were affected by the kind of headphones or speakers they were using. For example, it is possible that someone’s positive rating of a particular song had more to do with their use of high-quality headphones and less to do with their personality traits. As a result, the validity of their results may have been undermined by a third variable, namely the sound quality of the listening device used by each participant.

The value of a good pair of headphones is widely known among professional and many non-professional music enthusiasts. Pop stars like Ariana Grande and John Mayer are seen in recording studios, and even live concerts, using high-quality studio headphones from companies like Audio Technica and Beyerdynamic (Ariana Grande, 2021; Dead & Company, 2021). With time, many non-professional music enthusiasts have become increasingly wise to the fact that an investment of $150-300 for a set of studio-level headphones opens a window into a substantially richer music listening experience. It is thus no surprise that headphone sales have seen such significant growth and hundreds of YouTube videos have been dedicated to evaluating differences between the many options of headphones that are currently on the market. 

Because of the precise and immersive experience that headphones deliver, artists like Ariel Posen (2019) sometimes elect to perform to smaller audiences through studio headphones. This means that instead of hearing the music through a loudspeaker, where by the time the music gets to the audience the sound can be distorted, each individual is given a pair of studio headphones for the duration of the performance. The use of studio headphones among musicians in this manner further highlights the consensus that high-quality studio headphones offer uniquely meaningful musical experiences. 

Despite the limited academic research currently available, we were able to find two studies specifically addressing headphone preferences. The first study by Olive, Welti, & McMullin (2013) sought to establish consensus among trained listeners on the most preferred headphone frequency curve calibration. Therein, Olive et al. (2013) found that the trained listeners all preferred the headphone calibration that most resembled an “accurately calibrated” recording studio speaker system (as described earlier). However, since this study only included trained musicians, it’s unknown if the results regarding the most preferred headphones would also apply to the average consumer who might prefer headphones that have a different calibration.

The second study, published by Olive, Welti, & McMullin (2014) included 238 musically trained and untrained (non-musician) participants from multiple countries. Olive et al. (2014) found (in line with Olive et al. (2013)) that on average, listeners preferred headphones that reflected an accurately calibrated loudspeaker system typically used for close listening in recording studios. This was generally true regardless of the listener’s experience, age, gender, and culture. 

When applying the Olive et al. (2014) findings to the average headphone consumer, we find that even untrained listeners can distinguish between different types of headphones and would prefer the most accurately calibrated headphones. However, it is important to note that the study compared four types of headphones with prices ranging from $119 to $1500, and in the case of Olive et al. (2014) the cheapest headphones were most preferred because of their accurate calibration. This interesting finding, however, still does not shed light on the more fundamental question of whether high-quality studio headphones (when compared to cheap low-quality ones) are really worth the investment of the average music consumer. 

At present, we are not aware of any similar studies that have further compared preferences between low and high-quality studio headphones. Hence, the current study is designed to test the hypothesis that high-quality studio headphones are not only noticeably better than cheaper ones, but they also provide listeners with a significantly more immersive meaningful experience. To test this hypothesis, we designed a survey-based quantitative experiment where the participants will blindly rate their music-listening experience using both high and low-quality headphones. Based on how the participants rate their experiences across the two headphone types, we hope to establish if a difference is detectable and the higher-quality, more expensive headphones are indeed noticeably better.

Survey Based on Gricean Communicative Maxims

Grice’s four maxims of communication are a well-known model used to assess different aspects of communication (Grice, 1975). This model breaks down human communication into quantity, quality, relation, and manner. A recent study by Dolese and Kozbelt (2021) established that these maxims can also be used to assess the quality of visual art, which can be conceptualized as a form of communication between the artist and the art consumer. Based on a prior 62-item survey that utilized the four Gricean maxims to assess aspects of communication, Dolese and Kozbelt (2021) created a modified 33-item survey using the four maxims plus other additional measures that would capture communication in visual art. With this 33-item list, Dolese and Kozbelt (2021) established consistency between the subject’s ratings, confirming the utility of these maxims in evaluating aesthetic liking.

For the purposes of this project, we narrowed the survey down further to the 15 items most appropriate for the assessment of music (from the original set of 33), with each item rated on a scale from 1 (not at all) to 7 (definitely). Extending how Dolese and Kozbelt (2021) used the Gricean maxims to measure communication in visual art, our 15-item survey was designed to target similar elements in communication that take place between a musician and their audience. Specifically, our survey includes questions targeting the four maxims of quantity, quality, relation, and manner – plus an added measure of “preference” that was introduced by Dolese and Kozbelt (2021).

Measuring Emotional Response

In addition to the 15 questions evaluating the participants overall impression and relation to each musical composition, participants were also asked to rate their emotional reactions to each composition on a 5-point scale with 1 indicating “strongly disagree” and 5 “strongly agree.”

This scale was developed from the Eskine, et al. (2012) sublime measures scale, designed to measure emotional reactions to pieces of art utilizing dimensions of sublime experiences originally conceptualized by Burke (2008). Examples of dimensions used are (“I found this composition…”) moving, memorable, joyful, etc. Some negatively-valenced emotions like “dull” and “uninteresting” were also included to further assess and ensure the reliability of the data collected.

Control/Other Questions

Since several potential issues could affect the validity of our data, we compiled a set of questions to ask participants before and after the experiment. This included questions about the individual’s hearing ability, familiarity with the compositions presented, and general musical skill and knowledge level. We also included more direct questions regarding the subject’s knowledge of the headphones that were used for the experiment. If the subject reported being familiar with Audio Technica headphones, their survey data would be excluded from the analyses.

Methods

Materials

For the purposes of this experiment, we used Audio Technica’s ATH-M20X and M70X models with their prices currently at $49 and $295 respectively. Most sound technicians would consider the $49 pair low-quality, and the $295 pair would be considered high-quality. These price points and pairs are also generally reflective of what consumers are willing to spend in each category of headphones. Despite the ATH-M20X and M70X headphones being vastly different in quality, they look very similar (see appendix for photos). As elaborated in the procedures section, having the two headphones look similar was important since the goal of the experiment was to see if participants would blindly rate these two headphone models differently. For the control condition, we used the regular speakers on a laptop computer. With the headphones, it’s also important to note that we used a soundcard (by M-Audio) to ensure the audio signal was transmitted properly and consistently.

Song Selection

To control for the possibility that someone will resonate with a particular theme of a song more than others or that some may comprehend certain kinds of lyrics better than others, we decided to use songs without lyrics for this initial exploration. A website called “artist.io” was consulted to find two suitable tracks for the experiment with the intention of finding compositions that were likely not familiar to the general public. Since it is possible that individuals might rate familiar songs more favorably, choosing less known music was our way to ensure that participant evaluations will not be influenced by prior exposure. We selected two compositions: a dramatic slower-paced piece titled Taste of Defeat by Rotem Moav, and a faster, more upbeat composition called Kingdom Come by Theevs. The purpose of selecting two songs was to enable additionally investigating whether the headphones result in different reactions to one type of composition more or less than the other, or across both pieces in general. 

Procedures

Recruitment/Population

The participants for this project will mainly be recruited from the diverse population of Brooklyn College students ages 18-40 using the Psychology Department’s online Sona system in exchange for course credit. The recruitment posting will specify that participants must have healthy hearing. Some participants may also be recruited from flyers posted around campus and compensated with a payment of $15.

Design Overview

All research participants were required to attend two music-listening sessions, each lasting approximately 20-30 minutes. The participants were randomly assigned to either an experimental or control group, both following similar procedures with a few minor differences as outlined. The purpose of the control group was to give us a set of baseline values to generally compare the experience of listening through headphones to no headphones at all. The distinction between the two groups was that the control analysis was a between-subjects comparison (control/laptop speaker condition > experimental/headphones condition subjects), whereas the experimental group analysis allowed for a within-subjects comparison (low-quality > high-quality headphones).

Experimental Group

Upon arrival at our lab for their first session, we confirmed each individual’s eligibility for the experiment by reviewing and checking the requirements for participation, particularly the lack of any hearing difficulties or impairment. After providing their written consent, participants listened to two musical compositions (one slow and one upbeat) each lasting slightly over three minutes. After each of the two compositions, the subjects rated their listening experience by responding to the 15-item Gricean-style survey previously described and presented on the computer. This was followed by the ratings of their emotional reactions to the music, specifically rating 8 emotional dimensions (memorable, intense, suspenseful, joyful, powerful, unimportant, dull, weak, uninteresting) on a 5-point scale with 1 indicating “strongly disagree” and 5 “strongly agree.”

The second listening session scheduled about 2 weeks apart with a plus or minus leeway of 2 days, followed the same procedure, using the same music and survey questions. The only difference was that the researcher(s) conducting the study made sure to switch the pair of headphones the participant used from higher to lower quality (or vice versa) from one listening session to the other, counterbalanced across participants. Since both the better and worse headphones we selected looked very similar, this exchange of headphones between sessions was not detected by most participants. At the end of the second session, participants answered responded to a few additional questions regarding their level of music and sound production knowledge, plus a few other items to ensure that the instructions for the study were clear.

Control Group

At their first session, we confirmed eligibility and consent was obtained. The subjects then listened to one of the two musical compositions that the experimental group listened to - through a laptop speaker. The subjects then rated their listening experience by responding to the same 15-item Gricean-based and emotional response survey as the experimental group.

The second listening session followed the same procedure as the first. The only change from the first to the second session was the musical composition they heard, which was counterbalanced across participants. Specifically, if they listened to the upbeat song in the first session, then they listened to the slower song in the second session, and vice versa. At the end of the second session, control participants similarly answered the same few questions regarding their level of music and sound production knowledge, etc. as the experimental group.

Analyses and Expected Results

Direct-RT software from Empirisoft will be used to program and collect the survey data, and the ANOVA analyses will be conducted through the statistical software SAS or SPSS. We expect to obtain some statistically significant differences between how the participants rate the lower and higher-quality headphones, although this may not occur for all dimensions and items. We also generally expect to find differences between the control/laptop speaker group’s ratings and the experimental/headphone conditions.

We intend on running additional analyses to potentially detect differences between the ratings of the slower vs. upbeat songs. Furthermore, we will want to see if the type of headphones used influenced how meaningful the participants found the slow or upbeat song. For instance, it is possible that with the high-quality headphones, we will find no effect on the subject’s ratings for the upbeat song, but they may still find an effect when it comes to the slower song – or vice versa. Depending on the size and composition of our sample, particularly the number of musicians that end up in our recruitment pool, we may run analyses comparing the musician’s ratings to the non-musician ratings.

Discussion

We believe that the results of this experiment will have direct implications for headphone consumer choices. If our hypothesis is confirmed, this would mean that investing an extra $100-200 on a good pair of studio headphones can dramatically change how you experience listening to music in a positive way. I look forward to publishing the full results of our experiment soon.

 

 

 

 

 

References

Ariana Grande. (2021, April 6). studio footage: vocal arranging the “positions” bridge - ariana grande [Video]. YouTube. https://www.youtube.com/watch?v=Yv8Zih_Lt7o

Ariel Posen. (2019, October 25). Ariel Posen - Familiar ground [Video]. YouTube. https://www.youtube.com/watch?v=53841FJDBR0

Bonneville-Roussy, A., Rentfrow, P. J., Xu, M., & Potter, J. (2013). Music through the ages: Trends in musical engagement and preferences from adolescence through middle adulthood. Journal of Personality and Social Psychology105(4), 703–717. https://doi.org/10.1037/a0033770

Burke, E. (2008). A philosophical inquiry into the origin of our ideas of the sublime and beautiful (A. Philips, Ed.). New York, NY: Oxford Uni- versity Press. (Original work published 1757)

Dead & Company. (2021, September 16). Dead & Company: Sugaree LIVE from Noblesville, IN 9/15/21 [Video]. YouTube. https://www.youtube.com/watch?v=be1POFzQsug

Dolese, M. J., & Kozbelt, A. (2021). Art as communication: Fulfilling Gricean communication principles predicts aesthetic liking. Psychology of Aesthetics, Creativity, and the Arts, 15(4), 673–681. https://doi.org/10.1037/aca0000357

Egermann, H., Sutherland, M. E., Grewe, O., Nagel, F., Kopiez, R., & Altenmüller, E. (2011). Does music listening in a social context alter experience? A physiological and psychological perspective on emotion. Musicae Scientiae15(3), 307–323. https://doi.org/10.1177/1029864911399497

Eskine, K. J., Kacinik, N. A., & Prinz, J. J. (2012). Stirring images: Fear, not happiness or arousal, makes art more sublime. Emotion, 12(5), 1071–1074. https://doi.org/10.1037/a0027200

Greenberg, D., Kosinski, M., Stillwell, D., Monteiro, B. L., Levitin, D. J., & Rentfrow, P. J. (2016). The Song Is You. Social Psychological and Personality Science7(6), 597–605. https://doi.org/10.1177/1948550616641473

Grewe, O., Nagel, F., Kopiez, R., & Altenmüüller, E. (2007). Listening To Music As A Re-Creative Process: Physiological, Psychological, And Psychoacoustical Correlates Of Chills And Strong Emotions. Music Perception24(3), 297–314. https://doi.org/10.1525/mp.2007.24.3.297

Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and semantics (Vol. 3, pp. 41–58). Academic Press.

Kirk, R. E. (1956). Learning, a major factor influencing preferences for High‐Fidelity reproducing systems. Journal of the Acoustical Society of America28(6), 1113–1116. https://doi.org/10.1121/1.1908573

Olive, S. E., Welti, T., & McMullin, E. (2014). The Influence of Listeners’ Experience, Age, and Culture on Headphone Sound Quality Preferences. Journal of the Audio Engineering Societyhttps://www.aes.org/e-lib/browse.cfm?elib=17500

Olive, S. (2003). Differences in Performance and Preference of Trained versus Untrained Listeners In Loudspeaker Tests: A Case Study. Journal of the Audio Engineering Society51(9), 806–825. https://dialnet.unirioja.es/servlet/articulo?codigo=670870

Olive, S. (2012). Some New Evidence that Teenagers and College Students May Prefer Accurate Sound Reproduction. Journal of the Audio Engineering Societyhttps://www.aes.org/e-lib/browse.cfm?elib=16321

Olive, S., Welti, T., & McMullin, E. (2013). Listener preferences for different headphone target response curves. Journal of the Audio Engineering Society. https://www.aes.org/e-lib/browse.cfm?elib=16768

Pras, A., Zimmerman, R., Levitin, D. J., & Guastavino, C. (2009). Subjective evaluation of MP3 compression for different musical genres. Journal of the Audio Engineering Society. https://www.aes.org/e-lib/browse.cfm?elib=15074

Roose, H., & Stichele, A. V. (2010). Living Room vs. Concert Hall: Patterns of Music Consumption in Flanders. Social Forces89(1), 185–207. https://doi.org/10.1353/sof.2010.0077

Rumsey, F., Zielinski, S., Kassier, R., & Bech, S. (2005). Relationships between experienced listener ratings of multichannel audio quality and naïve listener preferences. Journal of the Acoustical Society of America117(6), 3832–3840. https://doi.org/10.1121/1.1904305

Statista. (n.d.-b). Headphones - Worldwide | Statista market forecast. https://www.statista.com/outlook/cmo/consumer-electronics/tv-radio-multimedia/headphones/worldwide

 

 

 

 

 

Previous
Previous

Honors Thesis Data Analyses

Next
Next

Thesis Interim Report - Assessing the Superiority of High-Quality Studio Headphones