How Content-recognition Software Works
Browse the article How Content-recognition Software Works
Introduction to How Content-recognition Software Works
You're in line for a movie when a great song plays over the sound system. Although you like the song, you have no idea what it's called or who sings it. You pull out your cell phone, dial a number and hold the phone toward the speakers. In a few seconds, you receive a text message with the name of the song, the artist's name and even a link you can follow to buy a copy. Photo courtesy of stock.xchng Content-recognition software can identify songs and protect them from copyright infringement. |
The service you called uses content-recognition software to identify the song. These programs are helpful if you want to learn about a song playing nearby. They can also help to curb copyright infringement, which is a huge issue for independent artists and corporations alike.
Peer-to-peer networks, file-sharing services and heavy hitters like YouTube provide people with lots of opportunity to access content without paying for it. Until recently, companies had to rely on a human being to detect copyright violations and then take action. Sites like YouTube normally count on users to report inappropriate material, but some don't consider clips that violate copyright law to be inappropriate. At the moment, most companies have to rely on employees to uncover proprietary video footage and log a report. It's a tedious, inefficient process that may soon become unnecessary thanks to content-recognition software. In this article, we'll explore exactly how the process works and how this software can help both people and businesses.
Dave Grannan and Mike Phillips of Mobeus give a sneak peek at speech-to-text technology that will allow cell phones to record natural language and translate it to text. See how speech-to-text technology works in this video from PodTech.net. Wearable computer systems outfitted with cameras can track users' hand movements as they sign in American Sign Language. Applications can then translate the signs into English text. Learn about mobile sign-language translators in this research video from Georgia Tech. |
Developing the Software
Several software companies plan to offer programs that can analyze audio and video clips, compare them to a database of content and determine whether they are from sources that are protected by copyright. Such software provides an efficient and relatively inexpensive alternative to combing through the vast amount of content on the Internet. It's also more reliable than asking your friend if he knows what song is on the radio.Photo used under the GNU Free Documentation License Limewire is one of several file sharing programs giving media companies massive headaches. |
There are other challenges as well. Some video pirates bring recording devices into films and capture movies on their own cameras. Some projectionists have been known to set up a digital video camera in the projection room, recording a first-run movie on its premiere night. Other people who bypass legal distribution might crop a video or otherwise alter it. Any program designed to find recordings like these can't rely only on programming language or identical files.
In the next section, we'll look at the process for identifying audio files and how it compensates for these challenges.
Content-recognition Software - Audio
The first step in identifying content is assembling a database of material that other files can be compared against. For a record company, this would include the company's entire music catalog. The content-recognition software analyzes each song and creates a digital tag identifying that song. Tags are called fingerprints or signatures. Photo courtesy of stock.xchng Software analyzes a song in parts looking for tags to identify the song. |
The software analyzes the actual sound of the song rather than its encoding language. Some programs analyze the tempo and beat of a song. Others measure the song's amplitude and frequency. Fingerprinting software usually takes several samples that last just a few seconds each from a single recording. A few companies offer software that analyzes entire audio clips in order to get as complete a fingerprint as possible. At least one current product analyzes a song for landmarks -- distinctive acoustic moments in the clip -- then analyzes the sound around the landmarks. Ideally, the landmarks will be readily identifiable when scanning other music.
The programs use algorithms to analyze sound. Most are a type of Fast Fourier Transform (FFT) algorithm. This mathematical technique can take a complex series of signals and track any changes within it. These changes -- whether they're tempo changes, beats per minute or the amplitude and frequency of the sound in the clip -- are mapped out and mathematically converted into a digital fingerprint. Fingerprints are usually in numeric form.
Once a record company establishes its database, it's ready to help identify songs to potential customers or to track down cases of copyright infringement. In either case, the software analyzes the unknown audio clips the same way it did for the songs in the company's catalog. It creates a hash, or short code, that's dependent upon the content of the audio file. The software assigns digital fingerprints to the clips, which it then compares to the fingerprints in the database. Next, we'll take a look at exactly how it determines whether the songs are the same.
Identifying the Sound
Often, sound clips being analyzed are not clean copies of a song. The song could be truncated, or it might be similar to a different song. This is where algorithms come in handy. The algorithm's job is to compare the fingerprints and determine if the incoming sound clip matches a song (or portion of a song) in the database within a certain range of probability. The identification process is similar to the way forensics experts once matched a suspect's fingerprints to those found at a crime scene. Before sophisticated computer software and advanced methods for examining fingerprints became available, experts would look for points of similarity between different fingerprints. In most cases, the specialist would need to demonstrate at least 16 points of similarity for a print to be considered a match.
Photo courtesy of stock.xchng The software matches fingerprints that represent a sound's waves to try to get a match. |
There is no standard probability range for content-recognition software. Most programs allow customers to adjust the level of similarity required to declare a match. For example, you could adjust the program so that it only brings back match results if the algorithm determines that there is a 95 percent or better chance it's a match. If the incoming clip doesn't fall in that range, it sends an error message to the user. When the program determines a match, a partnered application can take over. The application might send information to someone who wants to know the title of a song, or it might flag a song on a Web site and e-mail the corresponding record company's legal department. Some record companies have used such software to scan file-sharing sites or to track content on Web sites that stream audio. The entire process of analysis and matching takes only a few seconds.
In the next section, we'll look at how video content presents different challenges than audio files.
Content-recognition Software - Video
Recently, Time Warner and Disney partnered with YouTube to test video content-recognition software developed by Google. The software is similar to existing audio content-recognition programs in that it analyzes content to create a fingerprint. Then it compares that information to fingerprints in a database to determine if there is a match. However, video presents unique challenges that are not easily overcome.For example, most videos on YouTube are limited to 10 minutes or 100 megabytes. Since a clip could include any 10-minute segment from a film or television show under copyright, the content-recognition software must analyze the entire original work in such a way that it can make meaningful matches from a relatively small sample clip. Google isn't saying much about how the software manages this, but it's likely that the program analyzes overlapping chunks of the original content to create multiple fingerprints.
Photo courtesy of stock.xchng Analyzing video is more challenging than analyzing sound. |
Video content-recognition software must be able to identify footage even if the person who uploaded the content edited it first. For example, people can fool software that matches color resolution by tweaking the color saturation in a video. Cropping a video or uploading footage of a film captured on a video camera can also fool recognition software. Some pirated films are captured on cameras set up at an angle to the screen, further complicating the identification process.
One approach developers are trying is to use programs to base fingerprints off an analysis of the changes in motion characteristics in a video. Even this could prove ineffective if someone uploads a pirated video captured on a hand-held camera. In some cases, the probability range for matches may need to be fairly wide to flag all possible cases of piracy. Film studios may discover that they will still need a real person to review video clips to confirm a case of infringement. Still, the initial identification of potential video piracy will be much more efficient.
Video-identification software is still in the testing stage, though some companies are already holding effective demos of their programs. Challenges in identification won't end once the software is perfected, though. The sheer volume of video content presents a big problem. Movie and television studios will need to constantly update their databases with fingerprints for all the new content that comes out every day. While the process for uncovering piracy may become more efficient, it will still require constant upkeep and maintenance.