Voice Recognition
Voice as access key! We propose in this paper a speech recognition system with eight channels able to collect even the smallest nuances of the human voice. This system works with a new technique for speech recognition is modeled after the human brain and which ensures an accuracy of 99%. The assembly even has a voice synthesis circuit which provides all the information orally to assist the user.
Who has never seen a science fiction film in which computers and automated systems respond and voice activated only after receipt of an order?
Science fiction, precisely, but very little time yet. With respect to speech synthesis systems, "machines that speak" many years ago they are made and that it uses the integrated circuits of different models capable of reproducing sentences with perfect fidelity regardless language.
Regarding speech recognition, the theme is much more complicated and it is only recently that significant results were obtained. To recognize whole words or phrases, it is necessary to use devices feature a very high computing capacity to discern the nuances of the human voice. The possibility that a machine recognizes and understands exactly what can be said by a person is still distant, but great strides have been made in this direction.
For some time, very complex software are able to turn words into text messages, even if their accuracy is low. It is even more difficult to achieve autonomous systems, understand the systems that do not use computers. In this case, the solution is to use microcontrollers highly engineered specifically for DSP (Digital Signal Processor) conducting the analysis using highly complex algorithms that require external memory capacity is very high.
Recently, more sophisticated products than what existed previously, more efficient, but at the same time more flexible and economic, have been developed. Among the manufacturers of the most active and dynamic sector, it should be noted that the company Sensory California has developed a series of devices based on a technology called "Neural network recognition", similar as regards the logic reconnaissance and research that used by the human brain. A technique that significantly reduces the equipment used and at the same time, achieve an accuracy of about 99% against 96% for the most complex systems using DSP. Exploiting this technique, the company Sensory California has developed and commercialized two specific integrated circuits for this application referenced RSC-164 and RSC-264.
Both circuits can operate with a PC or independently. Each of these devices incorporates an 8-bit CPU with 4 MIPS (4 million instructions per second) based on an Intel 8051, an A / D and D / A with their respective filters, 64 Kbytes of ROM and 384 bytes of RAM, a bus to drive external memory and a series of pin I / O (input output) general use, while the 68-pin PLCC or 64-pin QPF. Obviously, such a large concentration of pins makes problematic the use of the integrated circuit by amateurs. For this reason we originally rejected the idea of doing a project based on this circuit.
Fortunately, a few months ago, the California company Sensory marketed a product called "Voice Direct Module" which includes the circuit in question, memory and other components that significantly simplify the implementation of this system.
But the most important thing for us on the connector used is composed of three rows of pins with 2.54 mm. With this module we have developed a system able to recognize eight sentences or words and turn connecting both relays.
It is obviously a matter of first approach to this technology however, but the proposed circuit operates by doing and he does "not wrong once." The device is able to recognize 60 words or phrases for a maximum of 3.2 seconds. Not only the circuit is completely interactive to the extent that it is capable of generating almost 500 words or phrases that guide the user through every action, both during the learning phase than during normal use.
These sentences are in English and are contained in a ROM in this module. It is therefore possible to change the content of this library in order to "talk" to the circuit in other languages.
To operate properly, the circuit requires a learning phase during which the user must decide before a microphone the words or phrases that the system will recognize thereafter.
During this phase, each word or phrase is analyzed and converted into digital data. Unique information that takes into account all possible variants, including intonation and inflection the rate at which we speak. After that, the circuit is activated only and exclusively with the voice of the person who conducted the training.
The digital message is stored in an external serial EEPROM.
During the recognition phase, the system performs the same operation and research among the stored data that is identical. If the research is positive, it activates the corresponding output and promptly notify the word or phrase has been identified by saying: "accepted". Otherwise, it activates the circuit and no output pronounced "unrecognized word." In cases where doubt exists, it delivers the message "repeat to confirm" in order to accept the order.
All functions are managed by three push buttons: P3 is used during the learning phase, P2 is used to erase the data stored and P1 is used as a start button (start) during normal use.
After pressing it, the system requires "a word "And prepares to recognize the word or phrase and act accordingly.
To avoid having to work manually on the start button during normal operation, we installed a circuit vox timed, controlled by the same microphone used by the voice recognition circuit. In this way just to approach the system and saying it out loud "activation" or any other word. This action is equivalent to a pressure P1. The
prompted to utter a word and performing the identification. To avoid that, during this phase, the Vox is activated again, It was proposed that a timer that inhibits the operation for about 10 seconds, enough time to complete the recognition process. In our application, the circuit combines an output of a relay fitted to each word or sentence reviewed.
We used only eight relay outputs and as many among the sixty available to us. In this configuration, the output logic becomes very simple. So much so that simply connect directly to each of the eight output lines, one of eight power stages.
To complete the circuit, we have provided a low frequency amplifier stage, capable of delivering 1 watt, and a voltage regulator designed to provide 5 volts stabilized module.
analysis diagram
To understand thoroughly the operation of the circuit, we must, above all, give a look to the module M1. This device, made entirely with SMT, has very compact dimensions and uses for its connection to external components, three in-line connectors with 2.54 mm named JP1, JP2 and JP3. The numbering of the module M1 carried over the wiring diagram refers to the pins of the connector JP2. The module is powered by a voltage of 5 volts to be applied between pin 4 (positive), 3 and 5 (negative). Pins 6 and 7, connected to the internal PWM generator, are connected to ground through two resistors. On pin 1 is applied and the microphone signal on pin 8 is available from the voice signal output.
The data bus is located on pins 12 to 19. Each of these lines has a "weight" different logical in the sense that the first line (pin 12) is "1", the second (pin 13) is "2" and so on until the eighth (pin 19 ) which is "8". If the memory circuit is completely used up with 60 words or phrases available when a word is identified, eg the third stored word, the third output line (pin 14) has, for a brief moment, a high logic level. It will be the same with the seventh line (pin 18) when the seventh word will be identified. If, for cons, the circuit recognizes the word that occupies the eleventh place in memory, which output active there? The answer is intuitive: Line 8 (pin 19) along the line 3 (pin 14) for 8 + 3 = 11. If you do not want to panic the relay must be limited to the first 8 positions from memory. For
pins 10 and 11 and lines on buttons are assigned other duties. Line 11 controls the learning section.
action on P3 phase starts learning phrases or words. The entire process is guided by the voice synthesis system generated by the integrated circuit. To stop the learning phase, press briefly on P3. A long press (at least one second) clears all data in the EEPROM and the learning cycle starts from the beginning. Mini DS1b switch controls the level of accuracy. Closed
, the maximum precision is obtained, opened, tolerance increases slightly. The same action applies to DS1 which acts by cons during the recognition phase. Both switches closed, we obtain an accuracy of about 99%, at least that's what the manufacturer claims. Indeed, during our tests, it never happened that a wrong output is enabled by cons, sometimes the system does not identify the word in the first attempt.
To start speech recognition, you must press P1 and follow the voice instructions given by the circuit. However, to make use of the system as convenient as possible during operation Normally, we have provided a vox system formed by the transistors T1 and T2 and the logical network formed by the gates of U2. The audio captured by the microphone is sent to the input of the M1 (pin 1). At the same time, this signal is rectified by the diode D1 and transformed into a pulse, the monostable active U2c U2a through the door. In turn, this floor control transistor T2 is located in parallel on the pusher P1. The sensitivity of the microphone is intentionally low so that the circuit is activated unnecessarily and that during the identification phase, noise is not superimposed on the user's sentence, altering the recognition process.
In practice, to activate the device, you must speak to a distance of 10-20 cm from the microphone and a similar distance is required for the recognition phase.
The audio signal generated by the module (recall that memory contains nearly 500 phrases) is available on pin 8. This signal is sent to the integrated circuit U3 to amplify.
The circuit used in this stage is the TBA820M, able to deliver a power of 1 watt on a load of 8 ohms. The RC networks are connected to different pins to determine the closed loop gain and limit the bandwidth down and down above. The amplified signal available on pin 5 drives the 8 ohm speaker connected between the output and ground. The volume is controlled via the potentiometer R15 which the cursor is directly connected to the input pin (pin 3) of TBA820M.
The eight output lines (pins 12 to 19) are connected to 8 power stages driving each relay.
When the line is activated, it goes from low to high, allowing the saturation of the transistor. On the collector of the latter, we have a relay and an LED that are activated briefly.
The relay contacts can be used to drive an electric or electronic devices, such as the electric lock of the door. Imagine the surprise that can produce a system that opens the door of the house by simply saying "open the door! But the most mesmerizing thing is that if someone else tries to open the door on the same principle, she will invariably reply in the system: "unrecognized word." For
circuit power, use a DC 12 volt DC capable of delivering 200 to 300 mA. This voltage feeds directly into the power stage BM and eight relay outputs. The other floors, including the M1 module, powered by a voltage of 5 volts supplied by the integrated controller U1, a 7805.
Figure 1: Diagram of the speech recognition system 8 channels.
Figure 2: Implementation plan components.
Figure 3: PCB scale 1 / 1 of the speech recognition system 8 channels.
Figure 4: Overview of a speech recognition system 8 channels.
Iist
R1: R2 10 kilohms
: 1 kilohm
R3: 6.8 kΩ
R4: 1 kilohm
R5: 220 kΩ
R6: 1 kilohm
A7: 10 kilohms
R8: 220 kΩ
A9: 10 kilohms
R10: 120 Ω
R11 10 kilohms R12
: 1.2 kΩ R13
: 47 kilohm
R14: 47 kilohm
R15: 47 kilohm adjustable
R16: 150 Ω
R17: 56 Ω R18
: 1 Ω
R19: 2.2 Ω
AR: 10 kW (8 pcs)
RB: 22 kW (8 pcs)
RC: 1 kilohm (8 pcs)
C1: 470 C2
chemical μF/35 V: 470 V μF/25 chemical
C3: 10 C4
chemical μF/63 V: 47 V chemical μF/25
C5: 47 V chemical μF/25
C6: 47 C7
chemical μF/25 V: 1 V μF/63
chemical C8: 100 nF multilayer
C9: 100 C10
chemical μF/25 V: 47 V chemical μF/25
C11: 150 pF ceramic
C12: 100 nF multilayer
C13: 470 C14
chemical μF/25 V: 220 V μF/25 chemical
C15 : 220 V μF/16 chemical
C16: 100 nF multilayer
C17: 100 nF multilayer
AC: 100 V μF/25 chemical (8 pcs)
D1 Diode 1N4148
D2 Diode 1N4148
D3: D4 Diode 1N4148
: Diode 1N4007
DA: 1N4007 Diode (8 pcs)
T1: BC547B NPN Transistor
T2: BC547B NPN Transistor
TA: NPN Transistor BC547B (8 pcs)
LDA: red LED (8 pcs)
U1: U2
7805: Integrated Circuit CD4093
U3: Integrated Circuit TBA820M
M1 : Sensory Voice Direct Module
RLA: Relay 12 V 1RT (8 pcs)
DS1: Dip switch 2 poles
P1: Push Button N / O
P2: Push Button N / O
P3: Push Button N / O
MIC: electret microphone 2 poles
HP: Loudspeaker 8 Ω 1 W
Miscellaneous:
1 x 8 pin IC Sockets IC Sockets
1 x 3 x 14-pin Terminal 2
pads for ICs
8 x 3-pin terminal block for PCB
1 x 50 pin Barrette scored for ICs
1 x PCB ref. L026
module used in this project produced by California Sensory is part of a series of devices called Interactive Speech, specifically designed for speech recognition. The module used in our application (Voice Direct IC Module) allows for a very simple way a complete system of voice recognition.
For the connection, the module has three rows of pins in line with 2.54 mm.
Achieving
Through the use of Voice Direct IC Module "that is the RSC-164 integrated module, the realization of this project is certainly within reach of all our readers. When wiring the system, we have provided a printed circuit on which all components are mounted. He is drawn to scale 1 / 1. To allow easy realization, you should realize it by the photographic method which enables a map similar in all respects to ours. For connecting the module, use three strips of material tulip with 2.54 mm in 14, 17 and 19 pins arranged as shown the site plan components. Mount
first components of the lowest and polarized components (note the + and -). Continue through the diodes and transistors, making sure their orientation. For installation of two integrated circuits, use two brackets, one 8-pin and a 14 pin. Then, mount and solder the eight relays and screw terminals for external connections. Finally, insert the module M1 which can be mounted in only one direction. At this point we must thoroughly check the wiring to detect a possible error or a bad weld.
To function properly, the system requires a phase of self-learning. For this, we must eliminate the vox circuit, this amount is obtained simply by not removing or not U2. Also choose the precision of the system, both in the learning phase in recognition phase. The first jet, it should leave open the two switches (the lowest accuracy). Press the button and follow the instructions P3 system that prompts you to pronounce the word or phrase that can validate the exit number 1 ("Say word one") and repeating the same word to confirm ("repeat to confirm") for then pronounce the sentence ("Accepted") and proceed to follows ("say word two"). The length of the word or phrase must not exceed 3.2 seconds. For the reasons that we have exposed further, we must speak at a distance of about 20 centimeters from the microphone. To stop the learning phase, press briefly on P2.
Remember that to erase all data in memory, you must press a P2 for a longer time (> 1 second). Do not store more than 8 sentences to prevent more than one of the 8 control lines are activated at once.
At this point we are ready for recognition.
Try pressing P1 and follow the voice prompts ("Say A Word"). The system should recognize the eight sentences stored safely. Of course the speaker must be the same as that carried out the learning phase, because the same sentence spoken by another person will not be recognized. If everything works correctly, after shutting off the power, insert the integrated circuit U2 and verify the operation of the vox. To get the boot, just strong enough to speak into the microphone by saying any word to activate the vox, the circuit generates a pulse that simulates the closure of P1.
0 comments:
Post a Comment