Start Over Back to Search

Cross-Linguistic Acoustic Characteristics of Phonation: A Machine Learning Approach

Author(s):: Panfili, Laura Maggia
Format:: Thesis
Degree granted:: Ph.D.
Publisher:: Ann Arbor : University of Washington, 2018.
Pages:: 360
Language:: English
Abstract:: Phonation, the process of producing a quasi-periodic sound wave through vocal fold vibration, plays different roles in different languages. Phonation types, or voice qualities, are produced by adjusting the length, thickness, and separation of the vocal folds. In addition to being a complex physiological phenomenon, phonation is also a complex acoustic phenomenon; different phonation types are generally distinguished by a constellation of acoustic properties, and those properties vary from language to language. This dissertation presents a machine learning approach to investigating the role those acoustic properties play in phonation in different languages. This study examines phonation in six languages from four families: English, Gujarati, Hmong, Mandarin, Mazatec, and Zapotec. These languages use phonation in a variety of ways, including contrastively, alongside tones, sociolinguistically, allophonically, and prosodically. Two machine learning algorithms—a Support Vector Machine and a Random Forest—are used to explore which acoustic properties best distinguish phonation types on vowels in each of those six languages. In addition to SVM weights and Random Forest importance, correlations and ablation studies are used in the analysis. Results reveal that while each of the six languages relies on a different subset of acoustic features to distinguish its phonation types, some features are consistently important. Phonation varies enough from language to language that languages should be treated separately for the study of phonation. However, all six languages rely on at least one variant of Harmonics-to-Noise Ratio, as well as Variance of Pitch Tracks, a new measure that takes advantage of the pitch tracking errors commonly found in non-modal phonation. Machine learning was also used to fine tune a classifier for English phonation types. Unlike other voice quality classifiers, this study focuses on just English and on the three-way breathy vs. modal vs. creaky contrast, rather than on a binary creaky vs. non-creaky distinction. The best performing classifier developed here achieves a weighted F1 score of 0.864, which is on par with state-of-the-art phonation classifiers but performs a more complex task. However, it still struggles with breathy voicing, largely a consequence of data sparsity. This dissertation demonstrates that machine learning is a powerful tool for the study of phonation. It illuminates some of the previously unexamined similarities and differences between phonation types in different languages, and introduces a new measure, Variance of Pitch Tracks, which proves quite useful in machine classification of phonation. In addition to contributing to the understanding of phonation, this dissertation presents a new methodology for its study.
Identifier:: HmongStudies2483