Features of Vocal Frequency Contour and Speech Rhythm in Bipolar Disorder
Article Summary and Review
Serious mental illness is often thought of primarily in terms of psychological symptoms and markers like sadness, euphoria, hallucinations, and delusions. Recent studies examining mood disorders are increasingly highlighting not only psychological but also biological indicators and features, and for good reason. Knowledge of the somatic mechanisms underlying mood regulation is essential in predicting and accurately diagnosing mood disorders. One of the physical functions that have been explored, is the study of voice production in affected patients. As noted in the study being reviewed, “almost the whole central nervous system is involved in voice production” (Guidi, A., et al.). Therefore, carefully analyzing voice features of patients in different mental states can provide important insight into some of the underlying brain processes. There has been significant research with regards to the way individuals in a depressed state tend to speak. Indications of slower and less intense speech, as well as flat intonation and a lack of linguistic stress have been identified and confirmed in research involving unipolar depressed patients. In the manic or hypomanic phase of bipolar disorder, it is well known among clinicians (and noted in the DSM-5) that patients often speak in a very disorganized “pressured” or rapid manner. Nevertheless, research regarding voice production in bipolar disorder (BD) is still in its preliminary stages. If, however, there are indeed detectable differences in how people with BD communicate across different mood states, studying these distinctions can improve our knowledge of biological aspects associated with mood disorders.
Bipolar disorder is characterized by an individual fluctuating between multiple mood states. At times, a person with bipolar may experience a heightened or euphoric state of arousal and high energy known as mania, or hypomania when less severe. They will almost always also undergo periods of depression generally associated with feelings of hopelessness, sadness, and sometimes suicidality. (The depression and suicidality are often connected with feelings of guilt and shame about things committed during the manic phase.) A euthymic state refers to more of a baseline mood. During this time some or many of the symptoms can be in partial or full remission. There can also be a mixed state when both symptoms of depression and mania are present. People with this disorder and those close to them often experience lots of distress as a result of the debilitating nature of the disorder. Because of the direct relationship between speech production processes and the general state of the mind, speech production can offer a window into many aspects of one's mental state. In this vein, this study set out to analyze specific aspects of vocal frequencies and rhythm features of speech in bipolar patients.
For this study, eleven participants with bipolar disorder plus eighteen healthy controls were selected. All participants were asked to read a neutral text for about four minutes on two separate occasions. The individuals with BD were recorded once during a euthymic state and on another occasion while experiencing either hypomania or depression. This allowed for comparison between the two mood states. The controls were also recorded on two occasions, generally about seven days apart. The recording session was performed twice for the controls as well in order to test for inter-day variability. The recordings were conducted using a high-resolution microphone (AKG Perception P220 Condenser Microphone) in order to accurately detect and compare detailed vocal features from each session.
Many methods and tests were applied and are referenced in this study. In the hopes of providing a sense of how the ultimate results were obtained, I would like to go through some of the specific methodologies that were used therein. Overall, two sets of voice features were analyzed in this study, namely “spectral shape features” and “rhythm features”, the former honing in on the nature of the actual sound obtained and the latter on the various pauses detected throughout the recordings. In the assessment of the spectral shape features, there were a number of details considered. For example, audio frames were classified into either silent, voiced, or unvoiced categories based on their relative level of energy. Another feature measured was the “power spectral density”. This component of sound was estimated by measuring (and eventually plugging into the algorithms used) the median frequency and peak frequency, and the median power amplitude and peak power amplitude. When looking at the rhythm features, there were two primary categories imposed. Pauses detected in the recordings that lasted less than 200ms were classified as “brief pauses” and any pause with a duration above 200ms was deemed a medium-long pause. The algorithm used to detect these pauses (Voice Activity Detection/VAD) did not have a high enough resolution to detect pauses lasting less than 10ms. Through the use of multiple databases, the measurements obtained through the recordings, and careful statistical analysis, the results extracted were clear and informative.
The results of the experiment were as expected. There were no significant differences detected when comparing the recordings of the healthy controls from one session to another. Likewise, when comparing euthymic recordings to each other, no notable differences were found. However, when looking at comparisons between recordings of hypomanic and euthymic patients with BD, the sound frequency and the pausing rates implied the presence of a specific kind of prosody. What was found was indeed a speech dynamic that was representative of the kind of “pressured speech” often described as a symptomatic feature of mania and hypomania. In line with previous research, this study also found the speech of those who were depressed to be slower and less intense.
Albeit the small sample size of this study, given that the results were clear and consistent across subjects, the findings of this study are significant and warrant further research and replication. It is the hope of the researchers that such investigations continue and expand in all ways possible. Such expansion is believed to have the potential to help clinicians in their challenging duties of providing accurate diagnoses and helpful treatments for their clients on a regular basis.
Personal Comments
Among the list of things that illustrate how deeply unique and complex human language is, is our capacity to change what we are saying, merely by changing the prosody or intonation of how we say any given sentence or phrase. Studying this research paper allowed me to see some of the amazing methods that have been developed to show how features like prosody and intonation are actually measured. Additionally, I was able to see how and why the study of speech rhythm can have serious implications in the clinical realm. While working on this paper I was also reminded of the “Singing Neanderthal Hypothesis” which is one of the perspectives of how the human language has evolved into its current dynamic form of communication. This study on speech rhythm in BD focuses a lot on the melodic properties of speech and what insight such properties might contain regarding other human beings. This appears to go hand in hand with the central theme of the Singing Neanderthal Hypothesis, which claims that the musical aspect of language has deep evolutionary roots and is important for understanding how humans communicate today.
Another note of interest in this study is that although all eighteen healthy control subjects were native Italian speakers, of the eleven with BD, seven were French speaking while the other four spoke Italian. As we have discussed in class, different languages can sometimes have different characteristics. For this reason, the researchers felt the need to check for inter-language variability to ensure that the analysis of the recordings was not impacted by differences in the languages used (French and Italian). In this case, the “Mann-Whitney U-test'' was applied to check for any interference, but after the said test was applied to normalize the data there were no statistically significant differences detected.
Finally, I was rather fascinated by the level of accuracy that the researchers were able to achieve in this study, specifically with regards to the descriptors used to scientifically prove how each set of recordings varied from the other. As an example, the way that the researchers were able to display what “pressured speech” looks like, for me, was a level of sophistication that I haven’t seen before. The ability to detect and measure such subtle differences in the voice is very important. As mentioned in the discussion section of the study, even though depression and mania seem easily distinguishable, they often share similar symptoms e.g. presenting a high level of restlessness known as psychomotor agitation. Such symptoms may present a challenge to obtaining accurate results in similar studies since recording may sound the same when in fact the subjects are in opposite mood states. However, by gathering more samples and doing further research, it is possible that studies may yield results that would show where the differences in speech actually lie, thereby helping clinicians make the necessary distinctions between seemingly similar kinds of speech. Because of the complexity of mental illness, it is not uncommon for people to be misdiagnosed or to receive a treatment that is not suited for them. The ultimate goal is to improve the diagnostic and treatment process to the highest level of accuracy possible. Further exploration of speech production in BD may very well prove to help minimize pain in the diagnostic and treatment process of this illness.
Original article:
Guidi, A., et al. “Features of Vocal Frequency Contour and Speech Rhythm in Bipolar Disorder.” Biomedical Signal Processing and Control, vol. 37, Elsevier Ltd, 2017, pp. 23–31, https://doi.org/10.1016/j.bspc.2017.01.01