BLOG

Audio Enhancement: Removing a Single Sound

August 24th, 2015

Noise ReductionOne of the most common audio issues that I address during an enhancement is noise and other extraneous sounds. The noise floor is usually consistent throughout the recording and can be removed to varying degrees by using noise reduction software. The most complicated issues are the extraneous sounds that are not continuous. These sounds could include anything from a plane flying overhead to someone whistling while people talk. These sounds are difficult to pinpoint with standard tools like noise reduction and equalization, but they can be identified using a spectrogram.

A spectrogram shows both the frequency content of a recording and the level of those frequencies over time. It may be the most helpful tool to an Audio Forensic Expert because it visually presents everything that is happening throughout the audio in one window. Using this, the expert can both identify and address individual harmful noises in the recording. With the right software, these individual sounds can be selected and removed without affecting any other part of the recording. It is important to remember that there is a right and a wrong way to do this, which is why only a
trained Audio Forensic Expert should be hired to complete an enhancement for use in court.

When processing audio, it can be easy to introduce artifacts to the recording. Artifacts are unwanted noise that is produced from various processing and compression techniques. Considering the goal of an audio enhancement is to eliminate extraneous noise, introducing artifacts is the exact opposite of what you want when working with a recording. Many things can introduce artifacts, but the simplest way to describe the cause is over processing. By over processing, I mean using extreme settings within individual audio tools.

For example, I often work with audio evidence that is extremely quiet. This often requires a gain increase of portions where only voice content exists. If the gain is increased too much, it can cause clipping of the audio output. When this occurs, the edges of the waveform are essentially clipped off, producing a distorted and noisy audio signal. The end result is a less intelligible voice than the original, essentially defeating the purpose of the whole process.

When adjusting individual ranges of frequencies on the spectrogram, it is very important to be aware of artifacts. Being able to recognize artifacts and know the limitations of what processing can be done is what makes an Audio Forensic Expert necessary. When isolated portions are processed with a trained ear and the right knowledge, noise can be eliminated and voices can be brought out without introducing any artifacts.

I recently worked on an audio recording that had a siren present during a portion of talking. Because it was so loud, it made the underlying dialogue difficult to hear. Luckily, the siren could be isolated in the recording. By selecting only the siren and then decreasing the gain a moderate amount, the voices became more audible while still avoiding any artifacts.

Audio Forensic Experts have a plethora of tools at their disposal, which is making audio enhancements more and more effective. There are some things to be cautious of when enhancing audio, but any technique that helps should be used as long as the science is sound.

 

 

Ted Rall and the L.A.P.D – What Really Happened?

August 20th, 2015

Ted Rall

On May 11th, 2015, Los Angeles Times Freelance Political Cartoonist Ted Rall published an Op-Ed relating to an incident he allegedly faced with the LAPD back in 2001.

Rall claimed while being stopped for jaywalking in Los Angeles, an officer of the Los Angeles Police department assaulted him. Rall included descriptions of the officer throwing his driver’s license into the sewer, being thrown up against a wall, and being handcuffed. Rall went on to describe a crowd of onlookers surrounding him during this event, asking officers about the legitimacy of the arrest.

In response to the post, the Los Angeles Police Department presented a 14-year-old recording of the event. Based on this recorded evidence, Rall was fired from the L.A. Times. Since then, Rall has disputed the LA Times and has produced both an enhanced version of the audio and a transcript of what he believes can be heard in the recording.

The L.A. Times commissioned Primeau Forensics to examine and enhance the audio recording provided by the L.A.P.D. The audio evidence was analyzed with the goal of uncovering the events as they occurred. Primeau Forensics holds no bias toward either party and approached the investigation as such.

You can hear the enhanced audio recording below. Read Primeau Forensics’ transcript of the confrontation here.



				

Audio Forensic Synchronization – What Happened When?

June 24th, 2015

Audio Forensic SynchronizationGenerally speaking, any device that captures video is also capable of simultaneously capturing audio. This audio can be crucial to the Forensic Expert, as it can show the expert more clearly “what happened when,” when it comes to a crime scene.

Picture this: a young man has just assaulted an older woman, and a police officer is in pursuit. As the young man begins to run from the police, another man on the street begins shooting video from his cell phone. The police officer is recording from both the police-car dash-cam, along with a body-worn camera, which he switched on when he began his pursuit.

As the young man continues to run, the officer announces “Taser!” and fires, activating the camera built into the officer’s stun gun. After sprinting around a corner, the young man is found dead. How did this happen? Who was responsible? When did the death occur? This is where audio can come in major handy.

As the appointed Forensic Expert, you are tasked with determining what you believe happened in this situation. The evidence available to you includes the cell phone video from the witness, the police officer’s dash-cam, the on-board camera from the taser, and the body-worn camera.

This is where your audio comes in handy, as it allows you to synchronize the audio in chronological order. Begin with the event that took place the earliest. In this case, it would be the police officer’s dash-cam, as it is always running. Next, find the portion of audio that starts the witness’ footage. Listen for a certain sound or yelled phrase for reference, and when you find that sound source from the officer’s dash-cam video, you’ll know that this is where the witness began recording.

This can also be done by visually inspecting the waveform. Large, quick spikes in the level can make it very easy to quickly sync the audio. Most software will also allow you to zoom in closely on the waveform so you can line the waveforms up as closely as possible.

Next, you’ll want to find the point where the body-cam began recording. Again, this will require critical listening and visual analysis to align the sounds from the body-cam evidence and the witness video to get an idea of when the officer began recording.

From there comes the last piece of evidence: the taser camera. Remember when the police officer announced “Taser”? Well, the second the trigger on that stun gun is pulled, the in-board camera kicks on. The body-camera audio will give you an idea as to when the officer announced his stun-gun use, along with when the video clip begins.

This will give you the most accurate occurrence of events. The actions of the taser camera are the most recent recording of the event, and in it you notice a loud sound that couldn’t be heard in the other recordings. That sound was the sound of a pistol, which another officer around the corner pulled out to shoot in an attempt to detain the criminal. This is what the body-camera, witness video and dash-cam, did not see. However, due to the alignment of the audio, the expert is able to see, in chronological order, the events as they occurred.

Synchronization does not always go so smoothly. Sometimes different frame rates are used in different videos, which can alter the speed of how the different videos play back. Most modern digital recording technology is self resolving and does not have this issue, but there are still devices that do not. These can cause the video and audio to be in sync at one point, but slowly drift apart throughout the video. It’s important to be aware of this so adjustments can be made to make sure that all the events are synchronized as accurately as possible.

The audio makes it much easier to synchronize all of the pieces of evidence together. The audio can provide both auditory and visual cues, through viewing the waveform, to use as reference points so an accurate sync can be completed. With only video, the different perspectives and qualities would make it extremely difficult to find exact reference points to line up.

Not all cases will give the expert this much to work with, but when working with multiple clips of the same occurrence, having a critical ear can be invaluable to understanding the timeline of the situation. Video can be powerful, but its direct counterpart, audio, can be essential to finding the cause of confusing and misleading investigations.

Visual Inspection

April 21st, 2015

visual inspectionSound waves can tell us a lot about a recording. Like metadata, the visual elements of a sound wave can expose characteristics of an audio recording without even having to listen to it. These characteristics can be important, especially when it comes to detecting edits within audio evidence. The process of observing these characteristics is called visual inspection.

Visual inspection (a general term that comprises a variety of forensic tests like narrow band spectrum analysis) is a crucial part of an Audio Forensic Expert’s job. To understand how crucial visual inspection really is, it’s important to understand the concept and value of the noise floor.

The noise floor (usually unwanted sound) of a recording is the present background noise and overall “ambience” of a recording. For example, if you’re recording yourself speaking on the street in New York City, and you’re speaking into a microphone while standing in one place, the sound of the cars going by, the conversations happening around you, and the overall city noise (unwanted sound) will contribute to the noise floor.

If you’re standing in one spot recording that audio, the noise floor will never change, because the environment your audio device is picking up will stay consistent the entire time. The second that noise floor is altered;you know you have an edit.

There are many ways to examine this. One of the most reliable ways to observe this noise floor is what’s known as a spectrogram. The spectrogram is meant to read the spectrum of an audio recording. To put it simply, a spectrogram takes the contents of an audio recording and conforms the characteristics to blends of color that represent the spectrum of an audio recording in Hz. You can see that below.

Now, because the noise floor of a recording never changes, you can tell when you have an edit when the spectrogram shows a change in, or absence of, color. The noise floor will always stay consistent, so when there’s a short drastic change such as the one pictured below, you know you have an edit. This makes the recording inauthentic.

Spectrogram edit circled

Surely there are other ways to visually detect edits. Even the sound wave itself can expose an edit.

All sound waves should be smooth and continuous. Even if someone were to loudly clap during an audio recording, the sound wave will still remain smooth and continuous. When you see gaps, or a wave that is not smooth and continuous with another piece of the audio file, you know you have an edit.

Though a critical ear is generally considered the most important part of Audio Forensics, a good eye for edits in visual inspection can teach you a substantial amount about the evidence you’re working with before even taking the time to listen to it. Visual inspection really comes in handy when trying to determine the authenticity of a piece of audio evidence and to make sure a proper chain of custody was kept throughout the distribution of audio evidence.

Using a Compressor for Audio Enhancements

April 8th, 2015

compressorAs an Audio Forensic Expert, knowing what tools are available to me and how they work is extremely important. One type of signal processor that I frequently use is a compressor. While this is often thought of as a tool for music production, it serves many functions in the Audio Forensic world. Like with most audio signal processors, it takes training and experience to operate compressors properly and effectively when enhancing audio. This experience also helps me determine whether or not the compressor is needed for enhancement.

A compressor is a device that automatically attenuates the gain of an audio signal. This means that when the audio reaches a certain level, the compressor will lower the gain of the audio signal. When the audio drops below this certain level, the compressor will stop attenuating. It is similar to a person manually adjusting the volume on a stereo as a song is playing. A benefit of a compressor is that it also has a ‘make up gain’ control. This allows the operator to raise the overall level of the audio after it has been attenuated. Through this process, the recording can be made louder without clipping or distorting the signal.

I will typically use a compressor when certain sounds in a recording are much louder than the rest of the audio and I need to balance the overall volume. An example would be a dog barking occasionally throughout a recording that is peaking much louder than the people talking. Using a compressor, I can attenuate the level of the barking without affecting the level of the people talking. Once the louder signal has been attenuated, I can use make up gain to increase the overall level of the recording. This becomes extremely helpful when the sound source that needs to be heard is quieter than other sounds. I will often receive recordings where the conversation that needs to be heard is buried or behind another sound source, like a television or even other people in the room. By adding a compressor, I can decrease the difference in level between the two signals.

Compression is not always the best approach for an audio enhancement and in some cases, I avoid using it completely. One of the biggest issues in recordings is a loud noise floor. The noise floor is the sum of all of the extraneous and unwanted noises in the recording. As I mentioned before, sometimes this noise floor is louder than the desired sound and therefore compression helps make the desired signal louder with respect to the noise. In some audio, the noise is already quieter than the desired signal. In these cases, using too much compression can actually increase the level of the noise relative to the desired signal. This can actually make the desired signal more difficult to hear and hurt the overall enhancement.

This is why it takes training and experience to properly use a compressor. With the knowledge that I have gained from my 30 plus years as an Audio Forensic Expert, I know when to use and when not to use a compressor on audio. I also know how to properly use it so that I improve the quality of the audio instead of making it more difficult to hear.

Sir Paul McCartney is NOT Dead- Actual Science

March 20th, 2015

paul is deadSo, you may have heard the rumor that Sir Paul McCartney is dead. In fact, the evidence that has been presented over the years is quite entertaining. The ‘Abbey Road’ album cover with Paul barefoot and songs like Strawberry Fields Forever or Revolution 9, that are purported to state messages regarding Paul passing, have all been linked to McCartney’s alleged death.

The rumor is that Sir Paul McCartney died in a car crash and was replaced with a man named Billy Shears who won the Paul McCartney lookalike contest. I am writing this blog post today to end the years of rumors with actual science: voice identification testing. The full voice identification report can be viewed below.

This assignment began when I received a call from Paul DuBay, a Beatles fan from San Antonio, Texas. He retained me to conduct this voice identification test because he wanted to know the truth.

The goal of voice identification testing and speaker recognition is to compare the known and unknown voices using critical listening, electronic measurement, and visual inspection of sound wave formation and spectrogram. The software programs I used for this voice identification test include Adobe Audition, Sony Sound Forge and Easy Voice Biometrics. Biometric technology is used as a secondary voice identification and speech recognition tool.

I used various songs performed and recorded before and after 1966 that feature the voice of Sir Paul McCartney. I was asked to compare songs from both of these time frames (pre and post 1966) to determine if the current Sir Paul McCartney is the same person as the pre-‘Paul is Dead’ (PID) Paul McCartney.

 


 

Screen Shot 2015-03-20 at 11.28.54 AM26 January 2015

 

Dear Mr. DuBay,

I am an audio and video forensic expert and have been practicing for over 30 years. I have testified in several courts (See updated CV attached) throughout the United States and worked on various international cases. My forensic practices for audio investigation include digital and analogue audio authentication, restoration and voice identification. As a video forensic expert, my practices include video authentication, restoration and identification. (Audio is mentioned first – note to Paul)

I received from you the following digital audio music files:

Song Title/Music File Name:                                                                        LP Title:

  • ‘1963 I Saw Her Standing There.m4a’                                         Please Please Me
  • ‘2002 I Saw Her Standing There(Live).m4a’                             Back In the U.S
  • ‘1965 I’ve Just Seen a Face.m4a’                                                    Help!
  • ‘1976 I’ve Just Seen a Face (Live).m4a’                                       Wings Over America
  • ‘1964 Kansas City.m4a’                                                                       Beatles For Sale
  • ‘1988 Kansas City.m4a’                                                                       CHOBA
  • ‘1967 Sgt Pepper’s Lonely HCB.mp3’                                          Sgt Pepper
  • ‘1964 Long Tall Sally.m4a’                                                                  Beatles Second Album
  • ‘Too Many People.m4a’                                                                       Ram

These songs were “before and after” samples from either the pre-publicity of Paul McCartney’s alleged death in 1966 or the post-‘Paul is Dead’ (PID) news story.

I am familiar with the theories that Paul McCartney was killed in a car accident in 1966. This is why you contacted me and asked that I perform voice identification and speaker recognition testing on various songs performed and recorded before and after 1966. You asked that I compare these songs from both of these time frames (pre and post 1966) to determine if the current Sir Paul McCartney is the same person as the pre-‘Paul is Dead’ (PID) Paul McCartney.

The goal of a voice identification test and speaker recognition is to compare the known and unknown voices using critical listening, electronic measurement, and visual inspection of sound wave formation and spectrogram. The software programs I used for this voice identification test include Adobe Audition, Sony Sound Forge and Easy Voice Biometrics. Biometric technology is used as a secondary voice identification and speech recognition tool.

I help my clients understand voice identification testing and speech recognition by using familiar voice examples. If you are in your office and a fellow employee comes into the office that you have worked with for many years and says ‘hello’, you recognize that voice without making eye contact. The same is true when you are at home and your spouse or even a relative comes in and says hello and begins talking to you. You know who the voice is before making eye contact because you are familiar with the voice. This is how critical listening examination is conducted during voice identification tests. I have been performing voice identification testing most of my career as an audio forensic expert. In fact, throughout the course of my career I have performed dozens of successful voice identification tests including a test for CNN on the voice of Apple’s ‘Siri’.

When beginning a voice identification test, I first become extremely familiar with both the known and unknown voices and list all similarities as well as differences during this repeated listening. In the case of Sir Paul McCartney, I listened to all songs repeatedly during this critical listening phase. I also measured and viewed the sound spectrum and wave formation repeatedly to arrive at my professional conclusion.

The following report will include descriptions of the similarities observed during critical listening, electronic measurement, visual inspection as well as biometric testing. I have not observed any differences in any of the voice samples tested.

I began by downloading the digital audio files that you sent onto my forensic computer then opened all using Adobe Audition CS 5.5. Next, I began critical listening to all of the vocal samples multiple times to become extremely familiar with all voice samples presented.

I focused first on the two samples of ‘I Saw Her Standing There’ as they were superior audio samples that also include a vocal number count at the beginning. Next, I focused on ‘Long Tall Sally’ and Sergeant Pepper’s Lonely Hearts Club Band’ as their vocal delivery are extremely similar and unique.

The beginning count (1-2-3-4) of the ‘1963 I Saw Her Standing There.m4a’ version and the beginning count (1-2-3-4) from the ‘2002 Live Version of I Saw Her Standing There’ is identical which indicates the rhythm of Sir Paul’s internal metronome is the same. The vocal range and phrasing in both samples is also the same. The slight difference in vocal tone is attributed to the age difference of Sir Paul as he has matured over the years and so has his voice.

The spectrogram image below shows exact frequency spectrum in spite of the difference in years:

Screen Shot 2015-03-20 at 11.17.00 AM

In the above image, the spectral frequency display is shown in the lower half of the image in cloro. The yellow, brighter colors indicate the stronger higher volume frequencies present in that portion of the audio while purple and black colors represent frequencies that are weaker or lower in volume. The audio on the left side of the image is the original recording of the song from 1963 and the portion on the right is the more recent recording from 2002. These can be heard in the comparison audio work product attached to this report.

When closely examining the formant frequencies shown in the spectral display above, it is noted that they are nearly identical. Formants are resonances or spectral created by a human voice. These are the frequencies that have the highest presence in a person’s voice and determine most of the tonal qualities of that individual voice. Because the formants in both recordings are almost identical, I conclude beyond a reasonable degree of scientific certainty that they are the same voice, Sir Paul McCartney. The slight variations can be explained by age difference of the voice between the recordings.

The next spectrogram image below shows the sample ‘Wooo’ from the chorus of ‘I Saw Her Standing There’. The original recording from 1963 is displayed on the left and the newer recording from 2002 is shown on the right. Note the fundamental frequencies that are circled in blue in both recordings. Through close examination (narrow band spectrum analysis), it is clear that the fundamental frequencies, harmonics and the range of the frequencies in the early recording and more recent recording are identical. Critical listening also revealed no differences between the two samples, which can also be heard in the comparison audio work product attached to this report. Therefore, through both visual inspection and critical listening, I have determined that the voice in each sample of ‘I Saw Her Standing There’ is the same voice and that of Sir Paul McCartney.

Paul McCartney

I continued investigating all songs submitted for voice identification testing and found similar results and arrived at the same conclusion. All vocals from all songs submitted are that of Sir Paul McCartney.

My next comparison was between the ‘1964 Kansas City’ Recording and the ‘1988 Kansas City’ recording. This is shown in the spectrogram image below:

Paul McCartney

The sections that I chose to test are circled. They are the words ‘Kansas City’. The recording to the left is the recording of the song from 1964 and the recording of the song on the right is from 1988. Through close visual inspection of the prominent frequencies in the words ‘Kansas City’, I found that both the fundamental frequencies and the frequency ranges are again nearly identical. I also used critical listening to further support my findings and have determined that the voice from the 1964 recording ‘Kansas City’ is identical to the voice in the 1988 recording. The vocal expression, pronunciation of the words and voice range are an exact match. I continue to conclude that all vocals from all songs submitted are that of Sir Paul McCartney.

The voice tones of all songs examined old and new, are extremely close and often identical when listening critically and viewing the narrow band waveform and frequency spectrum. Songs that were recorded farther apart in time have some small differences, which can be explained by Sir Paul’s difference in age when they were recorded. As people age their voice changes and so does their body. Vocal chords mature and usually grow deeper. Think of a boy going through puberty, the vocal chords mature and so does the voice. The same applies to people who enter their later years. Even though there are these slight differences, fundamental parts of the voice always remain the same.

I believe and will prove scientifically that a person’s singing voice is as unique if not more unique than their speaking voice. In the following paragraphs I will compare the Pre 1966 Paul singing style with the post 1966 Paul singing style by critical analysis of Sir Paul’s vocal timber and very loud and distinct voice. I have made observations while critically listening to upper register, near falsetto, voice signatures for the songs Long Tall Sally and Sgt Pepper’s Lonely Hearts Club Band. I will use the studio recorded versions of both songs, however, I have chosen the deconstructing Sgt Pepper isolated vocal to compare to the Long Tall Sally vocal. This back to back comparison file will be available in my audio work product attached to this report.

In Long Tall Sally, pre 1966, Sir Paul’s voice is very forceful and distinct. His vocal range and style of delivery exact. His O’s in the lyric ‘OOOUU Baby- some fun tonight’ match his vocal style in the first verse of Sgt Pepper’s Lonely Hearts Club Band.

Furthermore, listening to the 1973 studio recording of ‘Too Many People’ during the intro near falsetto adlib ‘piss off cake ay ay ay…ooooh’ at the beginning of the song, I clearly hear the exact same falsetto vocal Sir Paul delivered during his entire career with the Beatles and solo. It is an extremely unique style of singing that can only be produced by the real Sir Paul McCartney. See audio work product attached to this report. All vocals from all songs submitted are that of Sir Paul McCartney.

In the image below, Sgt Pepper is on the left and Long Tall Sally is on the right. Notice the sound spectrum ‘fingerprints’ between each vocal sample are nearly identical in display. Considering that these are different songs, this is a very significant identifier that both vocals are sung by Sir Paul McCartney.

Paul McCartney

I have also loaded isolated segments from ‘Sgt Pepper’ and ‘Long Tall Sally’ into a voice biometrics software program which is capable of taking unique but different voice samples and comparing them biometrically resulting in a percentage of certainty for identification.

I loaded isolated sections of Sir Paul from the beginning of ‘Long Tall Sally’ and the isolated first verse of Sir Paul’s vocal from ‘Sgt Pepper’s Lonely Hearts Club Band’ into Easy Voice Biometrics. The test resulted in a 53 % match. I believe the percentage rate is high enough to confirm a positive identification even though by biometric software standards we would like to see a higher percentage of certainty. In my opinion, this is due to the different words being sang and measured in two different songs. See screen shot of EVB test result below:

Paul McCartney

The biometric test was done as a secondary test to determine the voice similarities using another voice identification and speaker recognition testing process. Critical listening is the primary voice identification tool that I used to arrive at my conclusion (please see and hear audio work product attached to this report).

Conclusion

Listening to people like Dick Clark and Ringo Starr speaking as well as singing through the years, you can hear how their voices have matured yet are identifiable as being from the same person. This maturity fact is why I point out that Sir Paul McCartney’s voice has also matured. Sir Paul has an incredible voice that is extremely unique and, based on my 31 years experience as an audio forensic expert and scientific forensic testing; there is no other voice in the world that comes close to sounding the same or measures spectrographically the same as Paul McCartney.

Through careful analysis of the waveform and spectrogram as well as critical listening and biometric measurement, I conclude beyond a reasonable degree of scientific certainty that the voice heard in all of the song samples examined is of Sir Paul McCartney. This voice identification test confirms the rumor that Paul is Dead is not true.

This concludes this voice identification testing.

Respectfully submitted,

Edward J. Primeau, CCI, CFC


 

How to Set Up a Microphone for CCTV Systems

March 10th, 2015

CCTVClosed circuit television systems have become a major contributor of evidence to court cases. While the video from these systems is often very important, the audio can often play just as much of a role in the investigation. At Primeau Forensics, we often are hired to enhance not only the video from a CCTV system, but the audio as well. Clients typically hire us for enhancements because the original CCTV system was not set up properly and was capturing less than ideal quality audio and video. Many times, the audio is more valuable than the video because of what was said during the event. While enhancing the audio is possible, setting the microphone on the system correctly can be extremely beneficial when an incident does occur. Getting a good, clean signal from a microphone relies on two key principles: microphone gain and microphone placement.

Setting a proper gain structure for a microphone will always yield the best result for any kind of recording. Gain is applied to microphones because microphones have inherently low levels. A preamp is used to amplify the signal before it is recorded into a system. When setting the gain, the goal is to get a high enough level that the signal is audible, while also making sure that the level does not clip the system or preamp. Clipping means that the signal has exceeded the capabilities of the system and begins to distort. This distortion hurts the quality of the audio and can make it very difficult to understand what people are saying in a recording.

Gain structure is often set based on the input signal, which makes setting a surveillance system microphone difficult. The input signal of a surveillance system is always changing and cannot be manually reset whenever people enter or leave the area. When setting the gain for a surveillance system microphone, it is usually a good idea to test different levels of sound in the room. Having someone talk or even yell in the room while you set the level can ensure that the recording will not clip when it is recording later on.

We recently recovered some surveillance video evidence that required an audio enhancement. When we received the audio, we found that the gain had been set too high and the entire recording had clipped. We also found that because the room was small and filled entirely with hard surfaces, there was a buildup of reverberant sound. Reverb consists of reflections of sound off of surrounding surfaces. Some reverb is always present, but too much can begin to cover up the direct sound. Direct sound is the original sound coming directly from the source, such as a person speaking. In this case, the gain should have been set much lower on the microphone. This would have produced a much cleaner and more audible recording. Reverb is a more difficult issue to combat and relies much more on the microphone placement.

Different microphones will have different pickup patterns, which means that they will pick up sound in different directional patterns. Typical microphones used for surveillance systems are either cardioid or omni-directional microphones. Cardioid microphones pick up sound from one direction and reject sound from the opposite direction. Omni-directional microphones pick up sound from all directions. Knowing what kind of microphone your system is using is the first step to setting up a microphone. If you are using a directional microphone, it should be aimed at the area where the sound sources will be. Omni-directional microphones are easier to set up because they do not need to be aimed in any direction.

When placing the microphone, it is important to be aware of other extraneous noises in the room. We often see CCTV systems placed near the ceiling and in corners so they can obtain the best view point of the area. This is not always the best location for the microphone depending on how the room is designed. If an air vent or another electrical device is near the microphone, they will add a large amount of noise to the recording and can cover up desired sound. If a directional microphone is being used, try aiming the rejection end of the microphone at the unwanted sound source. This means the least amount of the unwanted signal will be picked up.

Reverb can also be an issue in smaller spaces that have no absorptive surfaces. When the direct sound is buried by the reverb, it can make the desired signal muddy and undefined. Acoustic treatment can be added to a room to deaden the amount of reverb, but this is more often an approach for musical spaces. A typical fix for a surveillance system can be placing the microphone closer to the desired sound. Placing the microphone in the center of the ceiling instead of in a corner could cause the microphone to pick up more of the direct sound, resulting in better and clearer audio. Because sound and reverb tends to build up more in corners, placing the microphone away from the corner will also prevent it from picking up those extra reflections.

Audio evidence is a very prominent part of many investigations and court cases. Setting up a microphone properly for a surveillance system can often make a huge impact on whether that audio can be used as evidence or not. As an audio forensic expert, I come across many audio recordings that could have been much more audible if the system had been set up properly. When installing a surveillance system, setting the gain for the microphone and placing the microphone properly will always improve the quality of the recorded audio.

Blindspot – How to Make Digital Audio Recordings for Evidence

February 19th, 2015

audio recordingsAs an Audio Forensic Expert, my day to day activity includes forensic services like audio enhancement and authentication, as well as voice identification. Audio enhancement is probably the most common service I provide, because more often than not, the audio evidence was not recorded in the best way possible. Audio evidence can often be one of the most important pieces of evidence for a case, so it should always be given a great deal of attention.

One of the most common ways people create digital audio evidence is by using digital audio recorders. Law enforcement will often use them for interrogations and confessions, and sometimes even out in the field as a backup for their dash cam or body cam. People outside of law enforcement use them for creating audio evidence as well.

I would like to mention that concealed audio recordings are not always legal. Federal law states that creating an audio recording only requires one person’s consent, but some states follow a ‘two-party consent’ law. This law means that all parties who are on the recording must give permission to the person recording in order for it to be used as evidence in court. I highly suggest looking in to your own state’s laws regarding concealed audio recordings before making one.

When creating a digital audio recording that is going to be used in court, there are many things one should be aware of before making the actual recording. The biggest issue I usually come across is low recording levels. While it is possible to increase the signal level afterwards through forensic audio enhancement, this is unnecessary time and money spent. This will also increase the noise floor of the recording, which can make it more difficult to hear what is happening in the recording. Creating a clean and audible original recording can make the enhancement process much easier and can often make the evidence much more useable in court.

When preparing to make an audio recording, regardless of whether it is a concealed recording or an interrogation recording, the user should always look at the settings of the digital audio recorder.

Two major settings determine the quality of a digital audio recording: sample rate and bit depth. Together, these settings also determine the bit rate of a recording. Changing these settings will affect both the quality of the audio recording and the amount of space used on the digital recorder. When creating digital audio evidence, it is necessary to balance these two in order to get a high quality recording while optimizing the amount of space on the digital recorder. Thankfully, many digital audio recorders will record in lossy compressed formats like MP3 files, which take up much less space and don’t sacrifice a lot of quality.

When recording digital audio in an MP3 file format, the two key settings to pay attention to are the sample rate and the bit rate. The sample rate will ultimately determine the range of frequencies the recorder picks up. At least two samples are needed to record any frequency, which means the sample rate must be twice as high as the highest frequency you need to record.

The range of human hearing is roughly between 20Hz and 20kHz. Typical audio recordings are done at 44.1kHz to capture the full range of human hearing. While this is standard for music and other professional recordings, it is not always necessary for audio evidence.

Most fundamental frequencies of the voice are between 100 and 500Hz with some of the most important harmonic content between 1kHz and 4kHz. This means that a sample rate as low as 8kHz can sometimes be adequate for recording a conversation, which will also save a large amount of space on the digital recorder.

Bit rate determines the amount of bits that are processed per second, which determines the fidelity of the audio. Typical MP3 files are recorded between 192kbps (kilobits per second) and 320kbps, but they can be as low as 32kbps. Just like with the sample rate, a higher bit rate means a higher quality of audio but also a larger file size. The issue that arises with low bit rates is that the compression process applied to the file can start creating digital noise in the recording. This digital noise can often cover up parts of the recording and once it is there, it is very difficult to remove.

When determining what settings to use on a digital recorder, it is always a good idea to make multiple test recordings before making an audio recording that will be used as evidence. These test recordings will let you try out the various settings and then listen back to see what sounds best and what fits your needs the most.

Another setting that is sometimes included on digital recorders is the ‘voice activation’ setting. This setting will start and stop the recording based on the amount of signal the microphone is picking up. While it can be a good way to save space on the recorder, it is not recommended that this setting be used when creating any kind of digital audio evidence. If this setting is on, the digital recorder could stop recording at a key moment in the conversation and miss a crucial piece of evidence. If extra space is needed on the digital recorder, adjusting the quality settings is a much better way to go. Recording all of the content at a slightly lower quality is a lot safer than relying on the ‘voice activation’ setting and missing important content.

Monitoring the battery life on the digital recorder is another very important thing to keep in mind. In some applications, like recording an interrogation, the digital audio recorder can simply be plugged into the wall so it will not run out of power. In other cases where you do not have this option, make sure the battery is fully charged or you have put in new, good quality batteries. Keeping extra batteries with you is also good practice, just in case the recorder does run out of battery and needs a replacement.

When creating the actual recording, try to be as close as possible to the person being recorded. As I mentioned before, one of the biggest issues with audio evidence is a low volume or record signal level. The farther away from the source the microphone is, the lower the signal level and the lower the signal to noise ratio. This means that less of the desired signal and more of the unwanted background noise will be recorded. Background noise can include any extraneous sounds such as furnaces, refrigerators, air conditioners, televisions or even the internal sound created by the digital recorder itself. These sounds can detract from the quality of the recording and often make the desired signals unintelligible.

Placing the digital recorder in a good location is key for making a good digital audio recording. Keep a few things in mind when making your recording. First, the microphone should always be aimed at the subject that you are recording. When placing the recorder in a pocket or a purse, aim the microphone towards the subject. Also make sure that the digital recorder is relatively stable in its location, because any movement of the recorder will be picked up by the microphone and can cover up other parts of the recording. Pay attention to any materials that may be in between the microphone and the sound source; the thicker the material, the more damping there will be on the signal, which will decrease the record level.

Many digital audio recorders have a microphone input which allows you to use an external microphone. The external microphone is always the best option to use if the recorder is going to be placed inside something. When using this option, it is always a good idea to use a high quality external microphone.

There are many different types of microphones that will work better for different situations. Lavaliere microphones are extremely helpful because they are small and usually omnidirectional. This means that they will pick up sounds from all directions and they can be placed anywhere on your person while the digital recorder stays in your purse or pocket. Other microphones, such as directional microphones, may work better during police interrogations because the subject will not be moving during the recording.

As I mentioned before, always create a test recording before making the recording that will be used as evidence. Testing different microphones, microphone placements and locations will help you learn how your digital recorder works and responds to different environments. If possible, try conducting the test recording in the same place that you will create the real audio evidence so you can prepare for any extraneous background noises and other obstacles. After making the test recordings, listen back so that you can make sure the desired sounds can be heard and the sound quality is high enough.

 

Voice Identification: Characteristics of an Unknown Voice

January 12th, 2015

voice identificationOne of the most important elements of Voice Identification is the ability to recognize the characteristics of the human voice. There are many elements to distinguish these characteristics, some audial, some visual.

Think about when you have your back to a person who enters the room and says hello. If it is your child, spouse or co-worker, I bet you recognize them immediately because you are familiar with their voice. This is the starting point for voice identification; becoming familiar with the characteristics of the unknown voice.

I began editing spoken word on reel-to-reel tape with razor blades and splicing tape. I had to learn to visualize the words in my mind’s eye in order to cut the tape in the right place. Today, we have software programs that display the waveform and sound spectrum of the spoken words, which make the editing process more accommodating. The editor can see the
way the words look on the computer screen while deciding where to make the edit and connect the sentences, removing the stutters, coughs, gaps and mistakes.

During the editing process, you will learn to listen for voice characteristics almost subconsciously. These characteristics include the way the words are spoken, the word pronunciations, vowel and consonant pronunciations, the recording noise floor (unwanted background noise), the way the words flow together, and significant patterns of speech you may detect, like accent, dialect and impediments, nasal cavity resonance, voice tone and inflection and speech pacing.

Pay attention to both differences and similarities from recording to recording, and take notes on your observations building a speech database for when writing the report.

Exemplars are defined as expert supervised audio recordings of predetermined spoken word samples for the purpose of voice identification comparison. During the exemplar creation process it is important to coach the person (subject) speaking for the recording into the same level of energy as the evidence recording of the unknown voice. Listen to the energy and attitude of the voice you are examining (evidence or unknown recording). Do you hear a mood or psychological characteristic in the voice?

In some bomb threat recordings I have examined, the speakers have an angry, sad or depressed attitude in their voice while speaking the recorded words. It is important to note that at the time of creating an exemplar, the subject is often not in the same psychological state as the individual in the unknown recording. While making the exemplar, do your best to coach the person (suspect) to speak with the same energy as the voice on the evidence recording.

Your critical listening ear will help you complete this process to the best of your (and their) ability. You have to listen critically beyond the subject’s current mood, because it is often difficult to coach them into the mood of the person on the evidence recording. Listen for specific speech characteristics in the exemplar and evidence recordings. What do you notice about the unknown voice that is characteristic of the known voice?

To practice, spend some time listening to spoken word recordings. These can be in the form of talk radio, podcasts and audio books. Write down speaking characteristics of the voice recordings like this:

• English accent


• Southern accent


• Consistent sibilant “s”


• Consistent long “a”


• Medium pitch, low pitch, high pitch

• Emphasis on “al” as in “halp” instead of “help”

• Does the subject have a characteristic rhythm to his speech or a pattern of delivering words and pausing?

Listen to several spoken word recordings and make a list of speech characteristics. Take notes on your observations.

Only through practice and experience will you become familiar with voice identification. When creating a new audio comp or assembly file in Sony Sound Forge or Adobe Audition, you will be able to listen to the speech sections that you are comparing repeatedly and with easy access. Back-to-back critical listening is an extremely important tool for voice identification. It is the best way to develop your critical listening skills and begin to recognize the different speaking characteristics of each voice examined. The familiar and unfamiliar speaking samples can be identified and characteristics can be easily noted.

Learn more about Voice Identification and Critical Listening in Forensic Expert Ed Primeau’s new book, That’s Not My Voice! available now on Amazon.

Knowing Your Digital Audio Recorder

December 18th, 2014

digital audio recorderWith digital audio recorders, there are a lot of options when it comes to the quality of the audio recording. Despite the easy access to these options, they are often overlooked. People are either unaware of these settings, or simply forget to check them when they begin a recording. While most settings on a digital recorder will yield a good enough quality recording, I have come across digital recorders with very low quality settings that could result in very distorted or unintelligible recordings. If you are using a digital audio recorder, it is important to have a basic understanding of what contributes to the quality of your audio recording.

Two major settings to be aware of are the sample rate and the bit depth of your recording. The sample rate determines how often a sample is taken from an incoming waveform. The bit depth determines the number of bits for each one of these samples. Together, these settings and the number of channels will determine what the bitrate is. The bitrate is how many bits are processed per a period of time. Bitrate plays a bigger part in lossy audio files.

Sample Rate

There are a few standard sample rates used in most recorders, often including 44.1kHz, 48kHz, and 96kHz. Audio is usually recorded at 44.kHz to capture the full range of human hearing. An audio waveform has a positive and negative pressure area; therefore a minimum of two samples must be taken from a frequency to reproduce it. The range of human hearing is generally given as 20Hz to 20kHz, though it can vary depending on the person. With a sample rate of 44.1kHz, frequencies as high as 22kHz can be recorded, which more than covers the average person’s hearing range. Higher frequency ranges such as 96kHz are used to capture twice as many samples and therefore create a higher quality recording, though most would argue that it is almost impossible to hear any quality difference unless using professional audio equipment.

Bit Depth

The bit depth, as mentioned, determines the resolution of each sample that is taken. A 16 or 24 bit setting is most commonly used; depending on what medium is being used. Audio CD’s, for example, only use 16-bit audio. The bit depth will determine the signal to noise ratio of a recording depending on a logarithmic formula. The signal to noise ratio is the comparison of the desired signal to background and internal noise. A 16-bit recording will have a 96dB signal to noise ratio, while a 24-bit recording will have a 144dB ratio. While 24-bit does have a higher SNR, the 96dB range of a 16-bit recording is often more than enough to create a good quality recording.

Bitrate

When using a format such as an MP3, bit depth no longer applies because of the lossy compression format. This is when bitrate becomes a more important factor of a recording. The bitrate is the number of bits processed in an amount of time, typically written in kilobits per second. The bitrate of an uncompressed audio file, such as a .WAV file, can be determined from the bit depth, sample rate, and number of channels. A CD with 44.1kHz, 16-bit stereo audio has a bitrate of 1411kbps. MP3 and other lossy audio files typically have much lower bitrates, which is why they are so much smaller than uncompressed formats. They achieve this through perceptual coding, which essentially removes parts of the data that are found to be unnecessary and unperceivable by the human ear. Typical MP3 music files have bitrates between 192kbps and 320kbps in order to maintain good quality. Digital recorders that record lossy formats will often have optional bitrates as low as 32kbps.

When choosing what settings to use for a recording, it’s important to consider the purpose of the recording. Music production is usually done with at least a 44.1kHz sample rate and a 16-bit depth. WAV and AIFF files are typically the file formats used for the master recording. When later compressed to MP3, as mentioned before, a bitrate between 192kbps and 320kbps is used to maintain the highest quality possible after compression. When a digital recorder is being used for another purpose, such as recording a conversation, other settings may optimize the performance and memory of the unit while still maintaining a high enough quality.

Whenever a smaller sample rate, bit depth or bitrate is used, the recording will always take up less space on the memory of the recorder. This can be very important to someone who may need to leave the recorder on for long periods of time. When capturing audio evidence, a recorder may need to be left on for hours or even days. If this is the case, and a lower quality file needs to be used, it is important to know how to go about maintaining quality while optimizing the memory.

Options and Limitations

While the range of human hearing covers up to 20kHz, fundamental frequencies of voice do not fall in the higher end of the frequency range. The human voice is strongest in the 1kHz to 4kHz frequency range. Because of this, it is possible to capture a completely audible and intelligible recording of people talking with a sample rate of only 22kHz. This would mean the highest frequency recorded would be 11kHz, which is still much higher than the most important frequencies in the voice. Some recorders can even be set to an 8kHz sample rate. While this does save a lot of space on the recorder, this means the cut off frequency would be 4kHz. This may be acceptable for some applications but may also cut down on the clarity of the voices. When a large amount of background noise is present, the higher frequencies between 4kHz and 10kHz can add some needed clarity to the voices. It is always a good idea to test the different sample rates before using them to make sure that the quality will be adequate for its purpose.

When trying to optimize the memory on a digital recorder, it is almost always a good idea to use a lossy compression format, such as an MP3. This means that determining the bitrate rather than the bit depth will be a factor in the size of the recording. As mentioned before, a bitrate between 192kbps and 320kbps is often very good quality for an MP3. When recording only a voice in which the content of the recording rather than perfect quality is the concern, lowering the bitrate can be very helpful for conserving space. One should be cautious when lowering the bitrate because the data compression may begin to affect the intelligibility of the recording. When too much compression is introduced, digital noise become easier to hear, which can sometimes cover up the desired signal. I have heard 32kbps recordings that had so much added digital noise that the much of the conversation in the recording had become unintelligible.

In summary, digital audio quality is determined by its sample rate and bit depth or bitrate. There are many options for these settings and not all of them may result in a good quality recording. It is always important to check these settings and be aware of the limitations each setting comes with before beginning a recording. Take into account the content of what you are recording and the quality of audio that is needed. The better you know your digital recorder, the more effective it becomes.

 

download-cv


forensic transcript
forensic files

Our demo video is coming soon!


expert witness