This Page includes useful information about how to use the Audioanalyzer to analyze your audio file. In addition, we tried to help you by writing short tool Tipps on the main page. For quick search, you can navigate to the preferred help-section in the menu on the left side. The website is programmed by using the following technologies:

- JavaScript ES6 (JS ES6)
- Plotly (JS-Framework)
- Rainbow (JS-Framework)
- MathJax (JS-Framework)
- Hypertext Markup Language (HTML5)
- Cascading Style Sheet (CSS3)

The audio processing to analyze your loaded audio file is based on a free Fast Fourier Transformation (FFT) programmed in JavaScript (JS) by Nayuki. For more information visit
www.nayuki.io. If you are not confident in signal processing, it is sometimes hard to analyze a sound file quick and easy. So, the main goal we addressed was to create an easy and useful tool to analyze audio files in terms of magnitude, phase
and other values which are displayable and interpretable as a spectrogram. The user can apply different settings for various examinations. We tried to describe all the parameters in an easy way. For describing the different windows, we used
**plotly.js** for illustration, **rainbow.js** to highlight the code and **mathJax.js** to write LateX formula in HTML5.

If you click on “Choose a file” you can load an audio file from your local computer. You can analyze different audio formats e.g. wav or mp3. (Click here for more information about supported audio formats). After you have chosen a file, the waveform and the magnitude-spectrum will be calculated and displayed below the buttons after the loading screen disappeared. The name of the loaded file will now be displayed on the load button. By clicking the button again, you can load a new audio file. Your audio data won’t be saved outside your computer at any point just analyzed. There is no webserver in the background, all calculation takes place on your own computer. In Figure 1 you can see the file manager where you can open an audio file.

AFter a audio file is loaded, You can play, pause and stop your loaded track by using the corresponding button in the control bar (Figure 2). Here you also find the button "Choose a file" as described in the section before. You can also adjust the volume by using the volume slider next to the stop button. Next to the volume slider the current playback time and the total length of the file are displayed. You can use the pause and play button to control the playback while you analyze. It is also possible to loop a selected part of the audio file by ticking the box next to "Loop Selection". How to select a part of your audio file is explained in the Waveform section. In case of analysis you can pick a matching grid size for the waveform including small, medium and large.

The waveform is compound by the linear amplitude (blue) and the RMS (light blue) as shown in Figure 3. On the x-axis, you have the time scale based on the total track length. In the upper left corner of the waveform the data of your current mouse position are displayed while your mouse in on the waveform. When you push play after loading an audio file, a red line shows the current time position in the waveform. For jumping inside of the file to another play position, just klick ones on the wanted position in the waeform or the spectrogram. For selecting a part of the file, just klick on the start positon of the part you want to select, hold the mouse button and drag to the end of the desired section. While draging, the area you will select will be shaded red. After releasing the mouse button, the spectrogram will zoom to show the complete selected part. If you now push the play button only the selected part will be played. If you want to play the selected part in a endless, youst check the checkbox behind "Loop Selection" .

The RMS is calculated by RMS $= \sqrt{\dfrac{1}{N}\sum \limits_{i=0}^{N} n_{i}^{2}}$.

In Figure 4 you can see the magnitude spectrogram of a loaded file. You can also display phase, group delay or a instantaneous frequency deviation spectrogram (more about these spectrograms in the section "Display-Type"). One thing they have in common, is that they display a value in its dependency from time and frequency. Frequency is displayed on the y-, time on the x-axis. The y-axis goes from 0 to the half sampling frequency of your audio file. If you move your cursor over the spectrogram the position data is displayed in the upper left corner. The actual value of the frequency-time point is decoded in the color. The value each color is representing can be taken from the color legend that can be found over the right corner of the spectrogram. If you want to have more information you can zoom into the spectrogram by pressing the buttons in the lower right corner of the spectrogram window. You can zoom both the time and the frequency axis. As you have read in the last section, it is possible to select a short piece of your audio file. When you select a section in the waveform, the spectrogram will automatically zoom in the selected part of the audio file. If you want to reset the spectrogram to see the complete spectrogram, you can use the zoom buttons or doublecklick on the spectrogram.

- Blocklength [samples]
- 512
**1024**- 2048
- 4096
- 8192
- Window-Type
**Hann**- Rectangle
- Hann-Poisson
- Cosine
- Flat-Top
- Hamming
- Blackmann
- Overlap [%]
- 0
- 25
**50**- 75
- 90
- Display-Type
**Amplitude spectrogram**- Phase spectrogram
- Group delay spectrogram
- Instantaneous frequency diviation spectrogram
- Colormap
**Viridis**- Gray
- JET
- Plasma
- Twilight
- Sunlight

You can customize the spectrogram individually by choosing the parameters shown in the above listing. The default values are the bold ones. For the amplitude spectrogram, you can choose a minimal and a maximal value which restricts the color range.
Depending on what you want to analyze you can choose between the four display types. The different display types (depending on the display type) are discussed in the section **Display-Types**.

In the Figure 5 you can see the different parameter dropdown menus and the input lines for min and max range value. On the right side of the image you can see the spectrogram magnitude legend depending on the chosen display type.

The block length is the number of samples used to divide the full audio samples into equal blocks. In our case the block length is equal to the FFT length. This means one bit in the spectrogram represents one block of block-length samples and defines the time resolution of the spectrogram. The shorter the block-length is chosen, the more time points will be calculated, because yu have more blocks your signal is split to. But on the other hand, the shorter the block-length is chosen, the lower the frequency resolution. For example, if you have a file with a sample rate ($fs$) of 44.1 kHz and a you choose a block length of 1024, the FFT will calcuate 1024 frequency bins. Each bin will therefore represent a band with a bandwidth of 43 Hz. If you choose a block length of 2048, you will have half the number of blocks, but each frequency bin will only represent 21.5 Hz. So you have always to choose between frequency and time resolution.

The following section is addressed to the different window types. We tried to explain the windows in various ways, mathematically, graphically and in the way we programmed them in JS. In general, a window can be explained as a weight for the audio signal in the time domain. If you look at a window-function itself in the time domain as well as in the frequency domain, you can categorize the window-functions in the frequency domain as the following [4]:

- Effective noise bandwidth
- 3 dB bandwidth
- Ripple in the passband.
- Highest side lobe
- Side lobe fall-off rate
- 60 dB bandwidth
- Shape factor

The following subsections explain each window-function more detailed. You see the mathematical definition of the window, an illustration and the implementation. In Table 2 you can find some key figures for each implemented window-function. For illustration all windows are calculated with the default block length of 1024. To link the math part and the code we have $N$ ≙ windowLen.length and $n$ ≙ windowLen[i].

The Rect Window is defined by: $w(n) = 1$

The rect window is known for its narrow 3 dB bandwidth (0.89 $\Delta$f). For analyzing music or speech (deterministic/harmonic) signals the rectangle window is very poor. Because the filter is [4]:

- Very poor in selectivity, due to a 60 dB bandwidth of 665 $\Delta$f
- Relatively large the terms of the ripples in the passband (3.9 dB)

If the signal you want to analyze is a sinusoid which has a frequency that hits one of the center frequency of the rect window you will get a very got result in the spectrum. But that means you must know your measurement signal. If you analyze a signal with crossover frequencies like music, the output frequencies around your filter center frequency will be weighted with the value of the side lobes. This effect is called leakage [4].

```
/*
* This code shows the implementation of an rect window in Javascript
*/
var windowLen = linspace(0, 1024, 1024);
var windowValueRect = calculateWindow(windowLen);
function calculateWindow(windowLen) {
var window = new Array(windowLen.length);
window.fill(1);
return window;
}
```

The Hann Window is defined by: $w(n) = \dfrac{1}{2}\left(1-\cos\left(\dfrac{2\pi n}{N-1}\right)\right)$

```
/*
* This code shows the implementation of an hann window in Javascript
*/
var windowLen = linspace(0, 1024, 1024);
var windowValueHann = calculateWindow(windowLen);
function calculateWindow(windowLen) {
var window = new Array(windowLen.length);
for (i = 0; i < windowLen.length; i++) {
window[i] = 0.5 * (1 - Math.cos(2 * Math.PI *
windowLen[i] / (windowLen.length - 1)));
}
return window;
}
```

The Hann-Poisson Window is defined by: $w(n) = \dfrac{1}{2}\left(1-\cos\left(\dfrac{2\pi n}{N-1}\right)\right)\text{e}^{{\dfrac{-\alpha\vert N-1-2n\vert}{N-1}}}$

```
/*
* This code shows the implementation of an hann-poisson window in Javascript
*/
var windowLen = linspace(0, 1024, 1024);
var windowValueHannPoisson = calculateWindow(windowLen);
function calculateWindow(windowLen) {
var window = new Array(windowLen.length);
// alpha is a parameter that controls the slope of the exponential
// (Wiki: https://en.wikipedia.org/wiki/Window_function)
var alpha = 2;
for (i = 0; i < windowLen.length; i++) {
window[i] = 0.5 * (1 - Math.cos(2 * Math.PI * windowLen[i] / (windowLen.length - 1))) *
Math.exp((-alpha * Math.abs(windowLen.length - 1 - (2 * windowLen[i]))) /
(windowLen.length - 1));
}
return window;
}
```

The Cosine Window is defined by: $w(n) = \cos\left(\dfrac{\pi n}{N-1}-\dfrac{\pi}{2}\right)$

```
/*
* This shows the implementation of an cosine window in Javascript
*/
var windowLen = linspace(0, 1024, 1024);
var windowValueCosine = calculateWindow(windowLen);
function calculateWindow(windowLen) {
var window = new Array(windowLen.length);
for (i = 0; i < windowLen.length; i++) {
window[i] = Math.cos(((Math.PI * windowLen[i]) /
(windowLen.length)) - (Math.PI / 2));
}
return window;
}
```

The Flat-Top Window is defined by:

$w(n) = \alpha_{0}-\alpha_{1}\cos\left(\dfrac{2\pi n}{N-1}\right)+ \alpha_{2}\cos\left(\dfrac{4\pi n}{N-1}\right)- \alpha_{3}\cos\left(\dfrac{6\pi n}{N-1}\right)+ \alpha_{4}\cos\left(\dfrac{8\pi n}{N-1}\right)$

$\alpha_{0} = 1$; $\alpha_{1} = 1.93$; $\alpha_{2} = 1.29$; $\alpha_{3} = 0.388$; $\alpha_{4} = 0.028$

```
/*
* This code shows the implementation of an flat-top window in Javascript
*/
var windowLen = linspace(0, 1024, 1024);
var windowValueFlatTop = calculateWindow(windowLen);
function calculateWindow(windowLen) {
var window = new Array(windowLen.length);
// alpha is a parameter that controls the slope of the window
// (Wiki: https://en.wikipedia.org/wiki/Window_function)
var alpha = [1, 1.93, 1.29, 0.388, 0.028];
for (i = 0; i < windowLen.length; i++) {
window[i] = alpha[0]
- alpha[1] * Math.cos(2 * Math.PI * windowLen[i] / (windowLen.length - 1))
+ alpha[2] * Math.cos(4 * Math.PI * windowLen[i] / (windowLen.length - 1))
- alpha[3] * Math.cos(6 * Math.PI * windowLen[i] / (windowLen.length - 1))
+ alpha[4] * Math.cos(8 * Math.PI * windowLen[i] / (windowLen.length - 1));
}
return window;
}
```

The Hamming Window is defined by: $w(n) = \alpha - \beta \cos\left(\dfrac{2\pi n}{N-1}\right)$

$\alpha = 0.54$; $\beta = 1 - \alpha = 0.46$

```
/*
* This shows the implementation of an hamming window in Javascript
*/
var windowLen = linspace(0, 1024, 1024);
var windowValueHamming = calculateWindow(windowLen);
function calculateWindow(windowLen) {
var window = new Array(windowLen.length);
// alpha and beta are parameters that control the slope of the window
// (Wiki: https://en.wikipedia.org/wiki/Window_function)
var alpha = 0.54;
var beta = 1 - alpha;
for (i = 0; i < windowLen.length; i++) {
window[i] = alpha - beta * Math.cos((2 * Math.PI * windowLen[i]) / (windowLen.length - 1));
}
return window;
}
```

The Blackman Window is defined by: $w(n) = \alpha_{0} - \alpha_{1} \cos\left(\dfrac{2\pi n}{N-1}\right) + \alpha_{2} \cos\left(\dfrac{4\pi n}{N-1}\right)$

$\alpha_{0} = \dfrac{1 - \alpha}{2}$; $\alpha_{1} = \dfrac{1}{2}$; $\alpha_{2} = \dfrac{\alpha}{2}$; $\alpha = 0.16$

```
/*
* This shows the implementation of an blackman window in Javascript
*/
var windowLen = linspace(0, 1024, 1024);
var windowValueBlackman = calculateWindow(windowLen);
function calculateWindow(windowLen) {
var window = new Array(windowLen.length);
// alpha is a parameter that controls the slope of the window
// (Wiki: https://en.wikipedia.org/wiki/Window_function)
var alpha = 0.16;
var alpha0 = (1 - alpha) / 2;
var alpha1 = 1 / 2;
var alpha2 = alpha / 2;
for (i = 0; i < windowLen.length; i++) {
window[i] = alpha0
- alpha1 * Math.cos((2 * Math.PI * windowLen[i]) / (windowLen.length - 1))
+ alpha2 * Math.cos((4 * Math.PI * windowLen[i]) / (windowLen.length - 1));
}
return window;
}
```

Figure 6 and 7 tries to illustrate two different overlaps. The overlap factor represents the relationship between the old and the new sample block. The more overlap you choose, the more samples of your previous block are present in the next block.
In terms of programming an overlap algorithm you define a so called **hopsize**, this means the number of samples you jump forward to the new start and end Index of your blocks. For example if you got a block size of 100 samples and an overlap
of 75% your hopsize results in $\frac{100}{4} = 25$. If you have calculated your hopsize correctly the next step is to calculate the total number of blocks for your audio data. In some cases the separation into equal block is not possible. So
we decide to throw away the last samples which won't fill a hole block.

On the Audioanalyzer page you can choose between different types of spectrograms we called **Display-Types**. All of them are based on a free FFT programmed in JS by Nayuki. In this section you find a short description of the spectrograms. We tried
to introduce the concept for each spectrogram. We just outlined the theory, for more information you have to search in several specialist books or professional journals.

As already mentioned we have a basic FFT to transform the audio data from the time domain into the frequency domain. The FFT we used needs the real part and the imaginary part of the audio signal. Both array must have the same length. The FFT returns a real part and an imaginary part of the transformed signal. The Fast Fourier Transform is a fast version of the Discrete Fourier Transform (DFT) which is defined as:

$X(n)=X(e^{j2\pi n/N}) = \sum\limits_{k=0}^{N-1}x(k)e^{-j2\pi nk/N}$

The inverse Discrete Fourier Transform (IDFT) is defined as:

$x(k)=\dfrac{1}{N}\sum\limits_{n=0}^{N-1}X(n)e^{j2\pi nk/N}$

The FFT algorithm itself works with a power of two block length. For further information you can find explanations in several signal processing books.

The section **Block-Length** includes some information about how to split an audio signal into blocks, depending on the block length. For a magnitude spectrum you need to transform every block of an audio signal from the time domain into frequency
domain. After you transformed your audio signal blocks, you have to calculate the absolute value of the complex number. The absolute value of a complex number ($z = a + \text{i}b$) is defined as:

$|z|=\sqrt{a^2 + b^2}$

If you look at the spectrum of your signal in general, you will see three dimensions the time (x-axis), the color coded magnitude and the frequency (y-axis). If we talk about the frequency you have to know about the Nyquist-Theorem. This theorem is very important when you work with sampled data (e.g. sampling rate $f_{\text{s}} = 44.1$kHz). The Nyquist-Theorem implies the maximal displayable frequency ($f_{\text{max}}$) is limited by the half of the sampling rate.

$f_{\text{max}}=\dfrac{f_{\text{s}}}{2}$

If you want to analyze the phase of an audio signal you can also calculate the phase of a complex number. The results of the phase data is circular and has a range from $-\pi$ to $\pi$. It is quit hard to interpret the phase spectrogram of and audio signal. Maybe you detect a structure but often its very confused. To display the calculated data it is an advantage to use circular colormaps (e.g. twilight).

$\text{arg}(z)=\text{atan}2(y,x)$

To have a better interpretable spectrum depending on the phase, its possible to calculate the group delay ($\tau_{\omega}$) for more useful information [2]. The result is also circular as the phase data and for the calculation you have to unwrap the phase data. The group delay is defined as the negative differentiation of the phase data along the frequency axis. The unit of the group delay is milliseconds. The value range depends on the processed audio data.

$\tau_{\omega}=-\dfrac{\partial\theta(\omega)}{\partial\omega}$

Another interpretable spectrum in terms of phase data is the Instantaneous Frequency Deviation. The IFD can also be used for automatic speech/speaker recognition (ASR). To analyze speech for example you can see the pitch and the harmonics crossing the zero IFD value. To calculate the IFD you need the Instantaneous Frequency (IF) which is defined by the differentiation of the phase data along the time axis. To extract the IFD we subtract the angular frequency from the IF spectrum $\nu(\omega,t)$. The value range is fixed from -125Hz to 125Hz. The IFD spectrum is comparative to the magnitude spectrum. The continuous IF and the IFD $\psi(\omega,t)$ is mathematically defined by the following [1]:

$\nu(\omega,t)=\dfrac{\partial\theta(\omega,t)}{\partial t}$

$\psi(\omega,t)=\nu(\omega,t)-\omega$

Colormaps are used to show the third dimension in the displayed time frequency plots like the classic amplitude spectrogram or the group delay spectrogram. In the amplitude spectrogram for example, the colormap codes the amplitude of each frequency bin to each time. There are three different types of colormaps included in this page.

Linear or Sequential Colormaps are designed in a way, that the lightness value of the color increases or decreases in a linear manner through the whole colormap. Through this, the interpretation of graphics are straight forward. The lighter the color
is, the higher is the value that the color is representing. The three linear colormaps that are part of the audioanalyzer are **viridis**, **plasma** and **gray**. They are only available for the amplitude spectrogram.

The rainbow color map **Jet** is well known because it was used in many plotting programs as the default color map in the past years. We do not recomend to use this colormap, because it is not as even in lightnes as the sequential colormaps are,
but we included it, because it was used often in the past years. This colormap can also only be used in the amplitude spectrogram.

Cyclic colormaps are used to display cyclic data like phase information. The phase of a signal goes from -π to π, where this both values represent the same phase. To show that, the first and the last color of circular colormaps are exactly
the same. The used cyclic colormap to show phase and group delay data is **twilight**. Twilight starts with white, goes on to blue over black on the center over red back to white. Twilight is used for group delay and phase spectrogram, because
in both spectrograms most values are at both ends of the value range.
**sunlight** is verry similar to twilight, but here, white is at the center of the scale. Sunlight is used for instantaneous frequency deviation spectrogram because here most values are at the center of the scale

With choosing the value range for the spectrogram, it is possible to define the value range, that is mapped onto the colormap. This is done by showing every data point with a value higher than the max value in the color representing this value, and the same for data points with values below the min value. By doing so, it is possible to get details more visible as shown in figure 9. In the B-part of the figure, the lines for the harmonics in the frequency range of 12.5 kHz are much better seeable than in part A of the figure

You can zoom in the spectrum window by pressing the buttons down on the right corner of the window. With the vertical once you can zoom into the frequency axis and with the horizontal one you can zoom into the time axis.

If you want to save the spectrum data of your audio signal you can use the save button. It works only if you choose the Display-Type "spectrum" and you have not zoomed into the spectrogram.

[1] Stark, A. P. & Paliwal, K. K. (2008). "Speech Analysis Using Instantaneous Frequency Deviation". Interspeech 2008

[2] Stark, A. P. & Paliwal, K. K. (2009). "Group-Delay-Deviation Based Spectral Analysis of Speech". Interspeech 2009

[3] Harris, F. J. (1978). "On the use of windows for harmonic analysis with the discrete Fourier transform". Proceedings of the IEEE. 66:51.

[4] Gade, S. & Herlufsen, H. (1987). "Use of Weighting Functions in DFT/FFT Analysis (Part I)". Brül & Kjaer Technical Reviews No.3.