Elsevier

Journal of Voice

Volume 29, Issue 2, March 2015, Pages 140-147
Journal of Voice

Discriminating Simulated Vocal Tremor Source Using Amplitude Modulation Spectra

https://doi.org/10.1016/j.jvoice.2014.07.020Get rights and content

Summary

Objectives/Hypothesis

Sources of vocal tremor are difficult to categorize perceptually and acoustically. This article describes a preliminary attempt to discriminate vocal tremor sources through the use of spectral measures of the amplitude envelope. The hypothesis is that different vocal tremor sources are associated with distinct patterns of acoustic amplitude modulations.

Study Design

Statistical categorization methods (discriminant function analysis) were used to discriminate signals from simulated vocal tremor with different sources using only acoustic measures derived from the amplitude envelopes.

Methods

Simulations of vocal tremor were created by modulating parameters of a vocal fold model corresponding to oscillations of respiratory driving pressure (respiratory tremor), degree of vocal fold adduction (adductory tremor), and fundamental frequency of vocal fold vibration (F0 tremor). The acoustic measures were based on spectral analyses of the amplitude envelope computed across the entire signal and within select frequency bands.

Results

The signals could be categorized (with accuracy well above chance) in terms of the simulated tremor source using only measures of the amplitude envelope spectrum even when multiple sources of tremor were included.

Conclusions

These results supply initial support for an amplitude-envelope-based approach to identify the source of vocal tremor and provide further evidence for the rich information about talker characteristics present in the temporal structure of the amplitude envelope.

Introduction

Vocal tremor is a voice disorder that is characterized by an unstable or shaky-sounding voice1 and measurable modulation of the acoustic output.2, 3, 4, 5, 6, 7, 8, 9, 10 These perceptual and acoustical characteristics are produced by tremor affecting components of the speech mechanism including the respiratory system,11, 12, 13 larynx,2, 3, 6, 7, 9, 10, 12, 14, 15 and vocal tract.2, 4, 7, 11, 16, 17, 18, 19 Tremor is associated with several different neurologic disorders including essential tremor, Parkinson disease, cerebellar dysfunction, and dystonia.20 In individuals with essential tremor, the most common tremor disorder, vocal tremor is estimated to occur in approximately 18–30% of cases.19, 21, 22

Previous research on essential vocal tremor has demonstrated that tremor affecting the structures within the speech mechanism produced nearly rhythmic modulation of the fundamental frequency (F0) and the intensity of the voice during sustained vowel production.2, 3, 4, 5, 6, 7, 8, 9, 10 The primary focus of this research was on measuring the modulation rate (ie, the number of cycles of modulation that occur within 1 second) and the modulation extent (ie, the range of modulation) of F0 and intensity. Dromey et al5 reported that the rate of F0 modulation ranged from 3.2 to 5.3 Hz and, similarly, the rate of intensity modulation ranged from 2.6 to 5.0 Hz during sustained vowels produced at a comfortable pitch and loudness by individuals with essential vocal tremor. The extent of F0 modulation in this study ranged from 2.9 to 15.0%; whereas, the extent of intensity modulation ranged from 18.5 to 55.6%. In a study of respiratory and laryngeal vocal tremor using acoustic analyses and electromyography, Koda and Ludlow12 found that the mean rate of modulation of the acoustic signal was 4.9 Hz. The rate of the acoustical modulations was consistent with the rate of the measured physiological modulations. That is, the mean rate of modulation of muscle activation of the two primary intrinsic laryngeal muscles involved in F0 control was 4.7 Hz in the thyroarytenoid and 5.1 Hz in the cricothyroid. The mean rate of modulation carried onto the respiratory structures and measured using respiratory inductive plethysmography for the same participants was 4.6 Hz.12 Measurements of both the rate and the extent of F0 and intensity modulation varied when individuals produced different pitches and loudness levels.5

In the majority of studies on essential vocal tremor, either the involvement of each component of the speech mechanism was not identified or multiple components of the speech mechanism were affected by tremor. As a result, it is uncertain whether specific acoustic modulation patterns are associated with tremor affecting the respiratory system, the larynx, or the vocal tract (for a review of possible contributions of each component of the speech mechanism to vocal tremor, see Lester, Barkmeier-Kraemer, and Story7).

Different methods have been proposed to improve acoustic analysis of vocal tremor for clinical identification and characterization of the source of vocal tremor including the vocal demodulator23 and the modulogram.24 The vocal demodulator measured the extent and rate of F0 modulation and of F0 amplitude modulation, with a range of modulation rate limited to 2.5 to 25 Hz. As an extension of the vocal demodulator, the modulogram analyzed the rate and extent of modulation of F0 and overall amplitude with three distinct rate bands: flutter (10–20 Hz), tremor (2–10 Hz), and wow (0.2–2 Hz). The vocal demodulator and modulogram were both used to analyze vocal tremor, and the modulogram was used to measure change in vocal stability pre- and post-botulinum toxin (Botox) injections in individuals with vocal tremor.14 However, it is difficult to clinically apply the results of these studies because these methods were used to analyze vocal tremor in individuals with a variety of neurologic disorders affecting more than one component of the speech mechanism. Without knowing the specific involvement of the respiratory systrynx, and vocal tract in vocal tremor and their direct effect on the acoustic patterns, it continues to be challenging to identify the source of vocal tremor in individuals with various neurologic etiologies.

To our knowledge, only two studies have systematically investigated the acoustic patterns of vocal tremor and the underlying physiology. Jiang et al25 isolated the source of tremor to subglottal, glottal, and supraglottal levels using an electric tapping device on the chest, side of the throat, and cheek during phonation to simulate vocal tremor in healthy adults. Acoustic analyses included measures of F0, percent jitter (ie, cycle-to-cycle frequency perturbation), and percent shimmer (ie, cycle-to-cycle amplitude perturbation). In addition, the frequency and amplitude contours of the acoustic signal were extracted, and peak prominence ratios were calculated for the contours by determining the energy of each peak and dividing it by the total energy within the signal. The peak prominence ratios derived from the frequency contour distinguished between the normal condition and the three simulated tremor conditions. The peak prominence ratio derived from the amplitude contour distinguished between all pairs of tremor conditions, except the chest and throat conditions (ie, the subglottal and glottal conditions). These results indicated that extraction of amplitude contours to derive power spectra is a useful technique in discriminating different sources of tremor. However, Lester and Story26 demonstrated that individuals respond to forced oscillation of the respiratory apparatus with adaptations at the level of the larynx (ie, changes in the mean F0 with different magnitudes of applied pressure oscillation). Therefore, a more controlled study of the characteristics of amplitude envelope modulation for isolated sources of tremor was warranted.

In an attempt to completely isolate the source of tremor to the larynx and to determine the associated acoustic characteristics, Lester et al7 simulated oscillations affecting only fundamental frequency to simulate vocal tremor affecting fold length (throughout this article, we use “source” to refer to the proximal cause of the acoustic perturbations as opposed to the causal “source” of the tremor, which is neural). Simulations were created using a kinematic model of the vocal folds27, 28 coupled to a wave-reflection model of the trachea and a parametric model of the vocal tract area function (J. Liljencrants, Unpublished doctoral dissertation, 1985; B. H. Story, Unpublished doctoral dissertation, 1995).29 Acoustic analyses of this simulated vocal tremor revealed comodulation of the F0 and overall intensity of the acoustic signal. Spectral analyses of the amplitude envelope were not investigated in this study nor have they been conducted with simulated vocal tremor affecting other parts of the speech mechanism.

In summary, vocal tremor can be caused by isolated or combined sources of tremor within the speech mechanism. Each source may produce modulation of the F0 and intensity of the resultant waveform. It is unclear whether measures of the rate or extent of acoustic modulation can distinguish the source of vocal tremor. However, spectral analyses of the amplitude envelope may provide discriminating information about source.25

It has long been known that rhythmic modulations of the amplitude envelope in speech are important for intelligibility. Filtering or adding noise to the amplitude envelope in the frequency range of 2–9 Hz substantially decreases intelligibility.30, 31, 32 There has been a recent resurgence in scientific interest in the modulations of the amplitude envelope as it has been suggested that neural oscillations may entrain to the rhythmic modulations of the envelope as a means of tuning perceptual processing to the intrinsic rhythms of speech (eg, study by Ghitza,32, 33 Peelle and Davis,34 Peelle et al,35 Luo36).

In addition to providing information about the linguistic content of the signal, the rhythmic characteristics of the amplitude envelope in speech may carry information about the talker. Measurements derived from the amplitude envelope have been demonstrated to differ based on the language spoken by a talker (eg, Korean vs Mandarin37) and can even distinguish individual speakers within a single language.38, 39 Such measures have also been shown to be effective in distinguishing clinically relevant categories of speech perturbations. Liss et al40 proposed that the power spectra of the amplitude envelope for speech computed for the entire signal and for octave frequency bands, which they called the envelope modulation spectrum (EMS), would contain relevant information about the rhythmic disruptions that are a common symptom of dysarthria. In support of this proposal, they computed a number of metrics (such as the peak frequency in the spectrum and the relative energy in the spectrum above and below 4 Hz) and demonstrated that one could reliably categorize speakers into subtypes of dysarthria (eg, ataxic, hypokinetic, etc.). In a discriminant analysis using just these metrics, they were able to classify 43 speakers across five subtypes of dysarthria with 84% accuracy.

The success of the EMS approach in classifying dysarthric subtypes suggests that similar amplitude envelope metrics may be able to classify vocal tremor sources from the acoustic signal, especially given that the major acoustic consequence of tremors have a strong temporal structure. To determine the utility of this approach for tremor, we follow the empirical logic of Liss et al40 using functional discriminant analysis to categorize signals solely on the basis of metrics derived from the amplitude envelope spectra.

Whereas Liss et al40 had previous dysarthric subtype diagnoses for each of their speakers to use as a basis for determining classification accuracy; it is difficult to provide a definitive diagnosis of vocal tremor source in many patients. Therefore, we followed the approach of Lester et al7 and used simulations of vocal tremor as the basis for our two experiments. The use of simulated vocal tremor provides us the benefits of (i) knowing with certainty the source of the tremor; (ii) the ability to create tremor with isolated sources or compound sources; and (iii) the ability to independently vary parameters of the tremor such as the rate of modulation, extent of modulation, and mean F0.

A set of sustained vowels were simulated by the computational model (described in the following) with different isolated tremor sources (Experiment 1) or with compound sources (Experiment 2). The EMS measures described by Liss et al40 were calculated for each synthesized signal. These measures were then entered into a stepwise discriminant function analysis (DFA) with the vocal tremor source serving as the grouping variable. If there is information in the amplitude envelope spectra for distinguishing vocal tremor sources, then one would predict that the classification accuracy of the discriminant analysis will be above chance. On the other hand, if different vocal tremor sources do not result in discriminably different amplitude envelope patterns, one would not expect the discriminant analysis to provide classification accuracy above chance. If the former prediction is supported, it would provide motivation for further exploration of the diagnostic possibilities of acoustic analyses of vocal tremor.

Section snippets

Tremor database

The simulations of vocal tremor were generated using a kinematic model of the voice source27, 28 coupled to a wave-reflection model of the trachea and vocal tract.41 This model allowed for control of the pressure supplied to the larynx, the fundamental frequency (corresponding to vocal fold length), and the degree of vocal fold adduction. This model also allowed for control of the configuration of the vocal tract filter for vowel shaping via a parametric model of the vocal tract area function.29

Experiment 2

Whereas the results of Experiment 1 demonstrated that amplitude envelope spectra measures can discriminate isolated sources of vocal tremor, the situation in real patients is likely to be much more complex. In particular, multiple sources of vocal tremor are likely to co-occur. It is of practical and theoretical relevance whether there is sufficient information in the EMS metrics to detect a particular source of tremor whether it is present in isolation or in combination with other sources. To

Results

The goal of this analysis was to test the sensitivity of EMS measures in classifying the presence of an individual source of tremor (ie, adductory), even when presented in combination with another source. Thus, the adductory, adductory plus F0, and adductory plus respiratory were considered one group with all other tremor types being considered members of the second group. Statistical analyses used in Experiment 1 were also applied to the computed variables in Experiment 2. Two discriminating

Discussion

The purpose of the two experiments reported here was to explore the possibility that information about the source of a vocal tremor may reside in temporal regularities in the amplitude envelope for speech. If such information existed and was robust, it would provide not only an additional noninvasive tool for diagnosis of vocal tremor source but also provide a mapping of acoustic perturbations in speech output to their physiological source, which could be useful in therapy. To determine whether

Conclusions

The experiments presented here provide evidence that information about vocal tremor source is potentially available in the amplitude envelopes of the full signal and select frequency bands of the speech signal. Whether such information is robust enough to serve as a guide for diagnosis and therapy will require additional experiments with increased variability of signals and variables that are more specifically designed for the particular classification task.

Acknowledgments

This work was supported by a research grant from the National Institute on Deafness and Other Communication Disorders (Grant 5 R01 DC 4674) to the fourth author.

References (43)

  • M. Bove et al.

    Development and validation of the vocal tremor scoring system

    Laryngoscope

    (2006)
  • J.R. Brown et al.

    Organic voice tremor: a tremor of phonation

    Neurology

    (1963)
  • C. Dromey et al.

    The influence of pitch and loudness changes on the acoustics of vocal tremor

    J Speech Lang Hear Res

    (2002)
  • E.M. Finnegan et al.

    Increased stability of airflow following botulinum toxin injection

    Laryngoscope

    (1999)
  • P. Warrick et al.

    Botulinum toxin for essential tremor of the voice with multiple anatomical sites of tremor: a crossover design study of unilateral versus bilateral injection

    Laryngoscope

    (2000)
  • V.C. Hachinski et al.

    The nature of primary vocal tremor

    Can J Neurol Sci

    (1975)
  • J. Koda et al.

    An evaluation of laryngeal muscle activation in patients with voice tremor

    Otolaryngol Head Neck Surg

    (1992)
  • H. Tomoda et al.

    Voice tremor: dysregulation of voluntary expiratory muscles

    Neurology

    (1987)
  • C.H. Adler et al.

    Botulinum toxin type A for treating voice tremor

    Arch Neurol

    (2004)
  • M.F. Brin et al.

    Movement disorders of the larynx

  • P. Gonzalez-Alegre et al.

    Isolated high-frequency jaw tremor relieved by botulinum toxin injections

    Mov Disord

    (2006)
  • Cited by (11)

    • Automatic Classification of Healthy Subjects and Patients With Essential Vocal Tremor Using Probabilistic Source-Filter Model Based Noise Robust Pitch Estimation

      2023, Journal of Voice
      Citation Excerpt :

      Among studies related to the spectral analysis, Jian et al. studied the difference between the vocal tremor and healthy subjects using the amplitude and frequency of the airflow and acoustic signals.13 Carbonell et al. showed that the spectral measures of the amplitude envelope are useful in identifying the sources of EVT.14 Cnockaert et al. used a Morlet wavelet transform based pitch and amplitude estimation to show a difference between the Parkinsonian and normophonic speakers.15

    • Effects of maxillary expansion on hearing and voice function in non-cleft lip palate and cleft lip palate patients with transverse maxillary deficiency: a multicentric randomized controlled trial

      2021, Brazilian Journal of Otorhinolaryngology
      Citation Excerpt :

      Mean fundamental frequency (F0; Hz): representing the number of vibrations of the vocal fold per second; Jitter percentage (pitch perturbation): representing short-term (cycle-to-cycle) deviation in the fundamental frequency of a signal/deviation from true periodicity of a presumably periodic signal25; Shimmer percentage (amplitude perturbation): representing the variability of the peak-to-peak amplitude between adjacent cycles of vocal fold vibrations26;

    • Effects of Orthognathic Surgery on Voice Characteristics

      2021, Journal of Oral and Maxillofacial Surgery
      Citation Excerpt :

      The following parameters were compared between T0 and T1: mean F0, minimum F0, maximum F0, shimmer, jitter, and noise-to-harmonic ratio (NHR). Their definitions are as follows:10,11 Fundamental frequency (F0; Hz): Number of vibrations of the vocal fold per second.

    • The rhythms of rhythm

      2023, Journal of the International Phonetic Association
    • Speech rhythms: learning to discriminate speech styles

      2022, Proceedings of the International Conference on Speech Prosody
    View all citing articles on Scopus
    View full text