Speech Perception
- The Study of Speech Sounds
- Psychological Background
- A Computational Analysis of Speech Perception
- Models of Speech Perception
- Speech Perception and the Brain
The Study of Speech Sounds
- Acoustic Phonetics
- Articulatory Phonetics
- Phonology
Acoustic Phonetics
- What are the physical properties of speech?
- Spectrograms
- Cues
What are the physical properties of speech?
Spectograms
- Time (Horizontal Axis)
- Frequency (Vertical Axis)
- Intensity
- Formants
- Numbering
- Importance of the First Two Formants
Cues
- Context Independent (stressed vowels)
- Context Dependent (e.g., consonants)
Articulatory Phonetics
- How are speech sounds produced?
- Consonants
- Vowels
How are speech sounds produced?
- The lungs provide a flow of air.
- The larynx vibrates.
- The shape of the mouth can be altered to change the sound.
- The tongue can be used to restrict the flow of air.
- Air can be redirected through the nasal passage.
Consonants
- Place of Articulation
- Manner of Articulation
- Voicing
Place of Articulation
- Bi-Labial (pin)
- Labio-Dental (fin)
- Dental (thin)
- Alveolar (zin)
- Paletal (chin)
- Velor (kin)
- Glotal (hin)
Manner of Articulation
- Stops (tin)
- Fricatives (sin)
- Affricates (chin)
- Nasals (min)
- Laterals (lin)
- Semivowels (win)
Voicing
- Voiced (bin)
- Unvoiced (pin)
Vowels
- Height of Tongue
- High (beet)
- Mid (bait)
- Low (bat)
- Part of the Tongue Involved
- Front (bat)
- Central (but)
- Back (pot)
Phonology
- What are the basic units of speech?
- How do speech sounds change when you combine them?
What are the basic units of speech?
- Phones are physically different speech sounds (vowels or consonants).
- Phonemes are the categories into which we classify phones.
- Allephones are different phones that we place in the same phoneme category.
- Phonemes and allephones are language specific.
How do speech sounds change when you combine them?
- Example: To make a noun plural, add an "s".
- The sound of the "s" depends on the context.
- glass --> glass+ez
- lip --> lip+s
- pig --> pig+z
Psychological Background
- Stages of Speech Perception
- The Importance of Top-Down Processing
- The Motor Theory
Stages of Speech Perception
- The Auditory Stage
- The Phonetic Stage
- The Phonological Stage
The Auditory Stage
- Segmentation
- The speech stream has to be segmented into phones.
- Identification of Features
- Invariant acoustic features (those associated with stressed vowels).
- What other features are important?
- Errors in phoneme identification.
Errors in Phoneme Identification
The Phonetic Stage
- Identify the Phoneme
- Categorical Perception
- IV is voice onset time.
- DVs are phoneme identification ("P" or "B") and ABX discrimination.
- Results are language specific.
The Phonological Stage
- Use rules of combination to check consistency.
- Days Dichotic Listening Task
- banket + lanket --> blanket
- Replication?
The Importance of Top-Down Processing
- Only 50% of the words in a tape recording can be identified out of context.
- Miller & Isard (1963)
- Warren & Warren (1970)
Miller & Isard (1963)
- Both syntactic and semantic constraints improve auditory word recognition.
- Accidents kill motorists on the highway. (+syntax, +semantics)
- Accidents carry honey between the house. (+syntax, -semantics)
- Around accidents country honey the shoot. (-syntax, -semantics)
Warren & Warren (1970)
- Context alters phoneme identification.
- It was found that the *eel was on the axle.
- It was found that the *eel was on the shoe.
- It was found that the *eel was on the orange.
- It was found that the *eel was on the table.
The Motor Theory
- Analysis by Synthesis
- People understand speech by figuring out how to reproduce the speech stream.
- The Motor Theory of Speech Perception
- An extreme version of analysis by synthesis.
- Assumes involvement of the articulatory mechanism.
- Problems with Motor Theory
- The McGurk Effect
The McGurk Effect
- /ba/ + /ga/ --> /da/
- Hear /ba/
- See /ga/
- Perceive /da/
- From the UCSC Perceptual Sciences Lab Web Site
- What does this tell us?
A Computational Analysis of Speech Perception
- What is the input?
- Frequency, Intensity and time.
- Context and knowledge of how sounds are produced must play a role.
- What is the goal?
- Classification of phonemes.
- What strategy is used to achieve the goal with the available input?
- Interactive processing is a must!
- Parallel processing is necessary to stay within the 100 step maximum.
Models of Speech Perception
- HEARSAY (Reddy & Newell, 1974)
- TRACE (McClelland & Elman, 1986)
HEARSAY (Reddy & Newell, 1974)
- The Design of HEARSAY
- The HEARSAY ARCHITECTURE
- The Semantic Component
- The Syntactic Component
- A Parsing Example
- The Lexical Component
- The Phonemic Component
- The Parametric Component
- Questions about HEARSAY
- Evaluating HEARSAY
The Design of HEARSAY
- HEARSAY is a computer program.
- HEARSAY was designed to show that computers can understand speech well enough to do
something with it.
- HEARSAY functions in the restricted domain of voice chess.
- HEARSAY consists of independent but cooperating components.
- Each component is able to generate, reject and rank order hypotheses.
- All communication is through a "blackboard".
- HEARSAY uses procedural representations.
- HEARSAY uses the same information found in a spectrogram.
- HEARSAY gives us a preview of where we are headed in this course!
The HEARSAY ARCHITECTURE
The Semantic Component
- HEARSAY generates an ordered (best to worst) set of legal moves based on:
- the rules of chess
- the current board position
- a user model (assumes a rational opponent)
- The current state of the conversation is used to eliminate moves ("e.g.,
"capture" eliminates non-capture moves).
The Syntactic Component
- HEARSAYs grammar consists of 18 rewrite rules such as:
- move à move1 + check_word OR move1
- move1 à regular_move OR capture OR castle
- capture à man_loc + capture_word + man_loc OR . . .
- capture_word à "takes" OR . . .
- These rules can generate more than 5 million sentences.
- A "generate-and-test" procedure is used to predict the next word.
A Parsing Example
The Lexical Component
- HEARSAY has a lexicon of 31 chess words.
- Each lexical entry includes:
- a phonemic description of the word
- its stress pattern
- its grammatical category
- a procedural representation of its meaning
The Phonemic Component
- Characteristics (Acoustic Features) of Phonemes
- Rules for Dealing with Missing or Extra Segments
- Juncture Rules
- Rules for Distinguishing Pairs
- Uniquely Identifiable Sounds (Stressed Vowels)
- Phoneme Boundaries
The Parametric Component
- Speaker Characteristics
- Allophonic Variability and Context
Questions about HEARSAY
- Is the representation local or distributed?
- How many levels of representation are assumed?
- What processes are assumed?
- Are the processes bottom-up, top-down, or interactive?
- Are the processes sequential or parallel within levels?
- Are the processes sequential or parallel between levels?
- Are the processes controlled or automatic?
- Are the processes symbolic or sub-symbolic?
- Are the structures and processes modular?
Evaluating HEARSAY
- HEARSAY was evaluated on 19 utterances containing 101 words.
- The entact model was correct on
- 88% of the words
- 46% of the sentences
- Without the semantic component it was correct on
- 65% of the words
- 14% of the sentences
- Without the semantic or syntactic components it was correct on
- 40% of the words
- 0% of the sentences
TRACE (McClelland & Elman, 1986)
- An interactive activation model for speech recognition.
- Three levels of representation.
- Distinctive Features X Time
- Articulatory properties that influence perception.
- Example: +/- voicing
- Phonemes X Time
- Words
- No attempt is made to segment the speech stream prior to phoneme identification.
Speech Perception and the Brain
- Massive parallelism satisfies the 100 step maximum.
- TRACE uses brain style computation, HEARSAY does not.
- According to the WLG Model of aphasia:
- Wernekes area stores information about word sounds.
- Brocas area is the speech planning and programming area.
- Functional imaging studies confirm that these areas are involved in speech perception.
- They are also involved in reading (phonological recoding)!
|