“audiable” to “understandable”


In October at the product launch of Smartisan Cellphone, the best crosstalk evening of the Eastern Hemisphere, the voice recognition technology of Smartisan Technology came under the spotlight, which took Yong Hao Luo, the CEO 20 minutes to explain.

While in security field, audio system has become an important part long time ago. Although there are over 70% network cameras having unidirectional or multi-directional audio function, there are actually little of monitoring cameras using it. Analyses on future audio monitoring market believe that it will sustain around 10% growth. HIS predicts that audio functions will gain more attention in the application of video surveillance system.

It’s been found that many video recordings the Police checked during the investigation after some accidents happened do not have audio information to prove respondents’ confession. Thus, it’s beyond doubt that the lacking of audio monitoring has become a barrier for the Police to solve cases and reduce doubts from the public. To solve this problem and increase convincingness of evidences, it’s necessary to add audio monitoring.

Therefore, nowadays many intelligent security systems require audio collecting and monitoring, such as safe city, procuratorial organs, financial institutions, public transport, educational exam monitoring, administrative services, law enforcement and evidence obtaining, and so on. More and more high quality projects require high-def and high-fi audio and video surveillance system.

How to turn “audiable” to “understandable”

Voiceprint Recognition

Voiceprint recognition technology is able to tell the speaker’s emotion and environment from the voice after years of development in audio monitoring, underpinning its further development.

As a biological recognition technology, voiceprint recognition is a technology that can automatically recognize the speaker’s identification through the voice parameters on his/her physical and behavioral features reflected from the speech waveforms. It’s necessary to stress the difference of voiceprint recognition and voice recognition. The former focuses on the speaker’s information rather than the meaning of the voice and pays more attention to the speaker’s personality, while the latter aims to recognize the meaning of the voice signals and ignores the speaker’s identification, emphasizing the commonness.

In the meantime, voiceprint recognition owns special advantages in application, compared to voice recognition:

(1)More convenient and natural to obtain voiceprint, and for this reason it owns higher acceptability among the users;

(2)Simper and cheaper audio recognition, only a microphone and no other extra equipment is required;

(3)Easy remote ID identification with a microphone, phone or cellphone via network (communication network or the Internet);

(4)Simple algorithm of voiceprint recognition;

(5)Higher accuracy when combined with other measures like voice recognition for message authentication.

These advantages get voiceprint recognition application more and more supporters including both system developers and users. It owns a market share 15.8%, second to biological features recognition like fingerprint and palm print, with a rising trend.

Sound Localization

Human beings can tell the location of the sound source with the help of their auditory sense. For example if you hear a sound when walking, you can immediately tell its type, threatening level and location. You can make such a judgement pretty fast by comparing information collected by two ears to check the direction and distance of the sound. That said, the process of sound localization is a complicated and integrated function of auditory system.

Sound localization requires intensity difference, time difference, color difference, phase difference and so on information.

Audio environment analysis

Auditory sense, as an important channel for humans to perceive the external environment, is also a key complementation to visual sense, playing an irreplaceable role under adverse lighting condition and bad sight. Audio signals can be collected with simper equipment and need less memory space and handling time, in comparison with image data. With the improvement of computational capabilities of mobile platforms, more and more audio-based applications are used. Audio processing algorithm has been the key of related researches, of which abstract, analysis and effective utilization of the semantic information contained by audio data owns great significance on multimedia index and summary based on content as well as the development of context adaptive applications.

Audio environment analysis aims to analyze, determine and warn the abnormal behaviors under the monitoring environment, to warn abnormal events, by combining the features of time domain and frequency domain of various abnormal sounds, which is its core technology, with classification methods of mode recognition.

Voice recognition

It’s been a long cherished dream of humans to make robots understand and talk with us. Voice recognition aims to turn voice signals to correct tests or orders after recognizing and understanding the process. After decades of development, artificial intelligence voice with strong learning ability will be widely applied.

Audio monitoring has a brilliant future in security field

It’s hard for traditional video surveillance system to achieve no dead corner monitoring due to the limits of cameras and installation angles, even with many cameras in different angles. Furthermore, sometimes it’s unable to effectively collect all field images for various environmental factors (such as lighting condition, intense light sources interference etc.) On the contrast, audio monitoring generally has no surveillance dead corner and can better control the real-time situation due to its technical features. For this reason, audio monitoring technology can well make up the disadvantages of video surveillance technology, and gain a promising future.

Meanwhile, audio owns many other unique features. For instance, it can be recognized both in daytime and at night, it’s hard to be blocked, and it shows direction. By installing a pickup on a ball camera, it can quickly locate and move the camera to the right position when abnormal sounds are sensed, so that it can take real-time videos when the accident happens, providing more information for its judgement.

In the artificial intelligence age, the applications of voice and voice-related as well as image and image-related technologies become increasingly important. And the data from voice, image and other sensors, and its generation, analysis, organization and consumption will definitely become a significant developmental direction of artificial intelligence field. We expect the great change of security field brought by intelligent audio.

? Beijing Kuaiyu Electronic Co., Ltd. All Rights Reserved.