Reconocimiento del habla

Temas a tratar en el curso:

Los temas a tratar en el curso (utilizando material proveniente del curso de reconocimiento de voz desarrollado por Microsoft son:

  1. Conocimientos previos y bases matemáticas
  2. Procesamiento de señales de voz
  3. Modelado acústico
  4. Modelado de lenguajes
  5. Decodificación del habla
  6. Modelado acústico avanzado

Conocimientos previos

  1. Fonética (definiciones básicas)
  2. Medidas de desempeño
  3. Ecuación fudamental del reconocimiento del habla

Pequeño problemario

  1. Las conosonates se caracterizan por:

a) the vibration of the vocal chords at a particular frequency

b) significant constriction of air flow in the airway or mouth

c) Loud volume compared to vowels

d) Number of syllables

  1. La pronunciación fonética n ey sh ah n corresponde a la palabra (en inglés):

a) Nasal

b) Nathan

c) Nation

d) Nissan

  1. Assume that 1% of the population are musicians and that 10% of the total population is left-handed. A recent survey of musicians reveals that 60% of them are left-handed. What is the probability that a left-handed toddler will be a musician?

  2. What would be the new probability if 50% of the population was left-handed?

  3. What is the name for the atomic unit of speech sound?

  4. Which of these is a common measure of statistical significance for speech recognition?

a) Student-t Test

b) Chi-Squared Test

c) Matched Pairs Test

d) Binomial Test

  1. The fundamental equation of speech recognition uses which mathematical rule?

a) L’Hopital’s Rule

b) Chi-Squared Test

c) Bayes’ Rule

d) Binomial Test

  1. Which of these can be used to measure the performance of a Speech Recognition System?

a) Word Error Rate

b) Sentence Error Rate

c) Latency

d) All of the Above

  1. In the fundamental equation of Speech Recognition, P(W) represents the…

a) Acoustic Model

b) Language Model

c) Pronunciation Model

d) None of the Above

  1. In the fundamental equation of Speech Recognition, P(O W) represents the… (Choose One)

a) Acoustic Model

b) Language Model

c) Pronunciation Model

d) None of the Above