This process involves determining what is said and taking an action based on the perceived information. Voice recognition systems generally work on the frequency content of the spoken words. Any signal may be decomposed into a series of sines & cosines of different frequencies at different amplitudes.
It is assumed that every word (letter), when decomposed into the constituent frequencies, will have a unique signature composed of its major frequencies, which allow the system to recognize the word. The user must train the system by speaking the words a priori to allow the
system to create a look up table of the major frequencies of the spoken
When a word is spoken and its frequencies determined, the result is
compared with the look up table. If a close match is found, the word is
recognized. A universal system that recognizes all accents and variations in speaking may not be either possible or useful.
For better accuracy, it is necessary to train the system with more repetitions. The more accurate the frequencies, the narrower the allowable variations. This means that if the system tries to match
many frequencies for better accuracy, in the presence of any noise or any variations in the spoken words, the system will not be
able to recognize the word.
On the other hand, if a limited number of frequencies is matched in order to allow for variations, then it may mix the words with
other similar words.
Many robots have been equipped with voicerecognition systems in order to communicate with the users. In most cases, the robot is trained by the user and it can recognize words that trigger a certain action in response. When the voice-recognition system recognizes the word, it will send a signal to the controller, which, in turn, will run the
robot as desired.
Voice synthesis is accomplished in two different ways:
One is to recreate each word by combining phonemes and vowels:
this can be accomplished with commercially available phonemes chip
and a corresponding program. Although this type of system can
reproduce any word, it sounds unnatural and machine like.
The alternative is to record the words that the system may need
to synthesize and to access them from memory or tape as needed. Although this system sounds very natural, it is limited. As long as all the words that the machine needs to say are known a priori, this system can be used.