Forensic Audio Enhancement Critical Listening

What is audio forensics? Wikipedia defines audio forensics as the field of forensic science relating to the acquisition, analysis, and evaluation of sound recordings that may ultimately be presented as admissible evidence in a court of law or some other official venue.

Audio forensic evidence may come from a criminal investigation by law enforcement or as part of an official inquiry into an accident, fraud, accusation of slander, or some other civil incident.

The primary aspects of audio forensics are establishing the authenticity of audio evidence, performing enhancement of audio recordings to improve speech intelligibility and the audibility of low-level sounds, and interpreting and documenting sonic evidence, such as identifying talkers, transcribing dialog, and reconstructing crime or accident scenes and timelines.

Modern audio forensics makes extensive use of digital signal processing, with the former use of analog filters now being obsolete. Techniques such as adaptive filtering and discrete Fourier transforms are used extensively.


The acquisition of digital audio recordings MUST be done in accordance with the acceptable protocols within the scientific community. These protocols have been established by an organization called SWGDE. The acquisition of digital audio recordings is broken up into three categories; establish/examine the chain of custody, request the original, retrieve the recordings using acceptable methods.


First, examine the chain of custody. What type of equipment was used to create the evidence? How was the evidence handled from the time of its creation to the delivery to the courtroom, as well as investigators and experts? Are there authentic chain of custody documents and reports that outline the chain of custody? Sometimes a chain of custody log from law enforcement will be included, which will strengthen the authenticity of the audio evidence. If the investigation of the chain of custody reveals inconsistencies, more often than not that recording is determined to lack authenticity and integrity.


If the expert is able to retrieve the evidence from the original source, in most cases that will automatically create and establish an authentic chain of custody, IF DONE PROPERLY. This retrieval process must be documented through video recording or images in order to provide an accurate record of what the expert did during the retrieval process. If it isn’t possible to retrieve the recording(s), then the forensic expert must carefully go through all of the documents and reports that arrived with the evidence. But if the chain of custody cannot be established, the forensic expert must rely on other techniques as well as their own expertise to determine the authenticity of the chain of custody, such as request of the original.


“As a general rule, a forensic audio laboratory should request the original recording or the earliest generation available. An original recording is the first manifestation of sound in a recoverable stored format. If the original recording is on analog media, playback and duplication rely on physical processes that introduce noise and degrade the signal, even if slightly. A copy of an analog recording can never be an exact duplicate. An original digital recording is a bit stream from which the acoustic audio signal can be generated. Exact copies of that bit stream can be made. With digital evidence, each stage of copying can be exact with no loss of quality between generations. The exactness can be tested and confirmed through the use of a hash function. Therefore, a bit stream duplicate of a recorded file is equivalent to the original.” -SWGDE Best Practices


“Means of securing the recorded evidence must be evaluated based on their effect on the recorded signal, and the available method of transfer preserving the evidence in a condition as close to the original as possible should be chosen. Use multiple means of collection if it is not apparent which available means will produce the highest quality.” (SWGDE Bast Practices for Forensic Audio PG 10) Appropriate retrieval methods are as follows:



One of the most common audio issues that I address during an enhancement is noise and other extraneous ‘unwanted’ sounds. The noise floor is usually consistent throughout the recording and can be removed to varying degrees by using noise reduction software. The most complicated issues are the extraneous sounds that are not continuous. These sounds could include anything from a plane flying overhead to someone whistling while people talk. These sounds are difficult to pinpoint with standard tools like noise reduction and equalization, but they can be identified using a spectrogram.

A spectrogram shows both the frequency content of a recording and the level of those frequencies over time. It may be the most helpful tool to an audio forensic expert because it visually presents everything that is happening throughout the audio in one window. Using this, the expert can both identify and address individual harmful noises in the recording. With the right software, these individual sounds can be selected and removed without affecting any other part of the recording. It is important to remember that there is a right and a wrong way to do this, which is why only a trained audio forensic expert should be hired to complete an enhancement for use in court.

When processing audio, it can be easy to introduce artifacts to the recording. Artifacts are unwanted noise that is produced from various processing and compression techniques. Considering the goal of an audio enhancement is to eliminate extraneous noise, introducing artifacts is the exact opposite of what you want when working with a recording. Many things can introduce artifacts, but the simplest way to describe the cause is over processing. By over processing, I mean using extreme settings within individual audio tools.

When adjusting individual ranges of frequencies on the spectrogram, it is very important to be aware of artifacts. Being able to recognize artifacts and know the limitations of what processing can be done is what makes an audio forensic expert necessary. When isolated portions are processed with a trained ear and the right knowledge, noise can be eliminated and voices can be brought out without introducing any artifacts.


The authentication process determines with scientific certainty the authenticity of the events that are represented, as well as the integrity of the recording and ultimately whether or not the audio recording in question has been tampered with. In this age of digital audio, edits can be made and covered up very easily. There are free versions of audio editing software – such as Audacity – which are available on line and can make edits that alter the events or conversation that originally occurred in digital audio recordings.

When one of the parties in a litigation believes that an audio recording was tampered with or edited, an audio forensic expert is brought in to investigate the recording. When we authenticate a audio recording, the first step is to establish chain of custody. While it is the first step, chain of custody does not, in and of itself, establish a recording as being authentic. I have witnessed audio evidence that was not authentic and was stored in a digital audio recorder. So what does the authentication process look like?


Critical listening must be the first step to become familiar with the audio evidence. If an edit is discovered during the critical listening phase, they are usually in the form of abrupt changes. Detecting these changes is not easy and comes with experience.

It’s important for the forensic expert to put themselves in a quiet, isolated room during critical listening so as to avoid any outside disturbances. The quiet environment enhances the critical listening focus. High quality, professional grade monitoring headphones and high quality studio monitors (speakers) are best for critical listening analysis of digital audio recordings. Professional quality headphones and speakers will have the flattest frequency response, which means they produce neutral and natural sound. This is very important for the forensic expert because subtle boosts and cuts in frequencies can impact the analysis of the digital audio recording.

forensic audio analysis


After critical listening, the forensic expert must use electronic measurement to examine the audio evidence. This is done by noting the prominent frequencies in the voices or other sound source and the noise floor. The levels of the recording and of the different frequencies can be measured as well. Tools such as spectrograms, frequency analysis windows and level meters are very helpful for observing and collecting this information. The expert should note the frequency range of the overall recording, the voices or conversation and the noise floor or extraneous sounds in the recording.

If the frequency range of a voice suddenly becomes larger or smaller or shifts in frequency range, that can be a sign of an edit. Sudden, unexplained changes in the noise floor level as well as the sudden presence of another background noise can also be a sign of an edit. As I mentioned before, I have come across recordings in which I could hear two noise floors. This can often be measured and seen in a spectrogram and a frequency analysis panel.

audio authentication

Visual Inspection

Visually inspecting the audio wave form and spectrogram is the next step in authenticating the audio. This goes hand in hand with the electronic measurement as the forensic expert analyzes the physical wave properties and frequency information. Waveforms are continuous and smooth when examined very closely. Even a quick, loud sound like a clap will have a smooth, continuous wave. If there are sudden breaks in the waveform of a recording, these are signs of editing. The expert should also pay close attention to the phasing of the waveform. This can also been seen when visually zooming in to the waveform. If the waveform of the recording is suddenly inverted, this can also mean an edit was made.

The spectrogram will show the full frequency spectrum with warmer or colder colors representing the strength of that frequency. The noise floor can be seen very clearly in this view, helping to identify breaks in the sound. All recordings have some noise floor, even if they are almost inaudible. When viewing the spectrogram, any breaks in the noise floor may be signs of an edit. Changes in the volume of the noise floor can also be a sign of an edit.

Metadata Analysis

Analyzing Metadata in Digital Audio Evidence

When I first began working as an audio forensic expert, most of my work was with analog audio evidence in the form of mini, micro and standard audio cassettes. I did have some cases where reel to reel tape was used. Today almost all recordings are done digitally, there is additional information that can be analyzed when performing an audio authentication. Digital audio recordings contain metadata which reveals information about how the recording was made and the type of equipment that created the recording. If a recording was loaded into a software program capable of performing edits, there will often be a footprint left in the recording HEX information showing what software was used.

When examining the digital information, it is necessary to create an exemplar recording to compare the metadata with the original. An exemplar is a recording that is made in conditions that are as close to the original recording as possible . The exemplar is made on the same kind of audio recorder and, if possible, the same environment. Using this exemplar, the forensic expert can compare the metadata and HEX information of the two files. If there are inconsistencies in the data, that can also be a sign of tampering.