Laboratory for pitch / formants analysis using WaveSurfer
Please read the lecture about voice and his
timbre to get the definitions of the various parameters and phenomena
here quoted. There you can find also a link to the WaveSrufer download
page.
First of all, you need a file with some vowel pronounced. You can
record yourself using WaveSurfer, or use a previously recorded wave file.
In any case, we suggest to record a spoken version of each vowel, and a
sung version at different pitch, to make the game interesting. Let use a short
mono recording, @ 16KHz of sampling frequency which is quite enough. You can
record also a
musical instrument, of course. Winds and bowed strings are better, because
of some pronounced formants due to the sound box, the tube or the
bell.
WaveSurfer is multiplatform, so that we will make use in the following
of a multiplatform terminology and lexicon (at less for Windows and MacOS),
may be to the detriment of precision.
Now install WaveSurfer. So speaking: it doesn't require any installation
at all: simply decompress the executable in any reasonable folder, then
put a link (alias) to the executable on the desktop, in order to launch it
easily.
This is the WaveSurfer main Window:
Take into account that WSRF (we will cal him in this way in the
following) is configurable, namely it can be adapted to the
specific task you have in your mind.
Create a New file using ,
or with menu File->New. Or you can open an existing one using .
In either case, you will see a configurations window, containing standard,
or previously saved WSRF configurations.

Select "Speech analysis".
WSRF
will place on your screen three panes, after you have wait a while to let
WSRF
doing calculations, if you have open an existing file.

Wafeform Pane : This
is a classic waveform plot, such as you can find in any sound editor.
This pane is synchronized with the subsequent two panes, showing only
the fraction of the entire file on which you have zoomed in. This is not
the behavior of the last pane, in which the entire file is always shown.
Spectrograms
and formants Pane: synchronized with the waveform pane, it shows
the spectrogram with the first four formant track superimposed. Red for
F1, Green for F2, Blue for F3, Yellow for F4. You can change these
colors if you want, in the preferences pane (see later).
Pitch Pane: Here
comes the plot of the pitch (be aware it is the pitch, not the
fundamental. The algorithms used detect the pitch even if the
fundamental is missing, from the even space between partials).
Waveform
"overall" Pane: Here the entire file is always visible in the
form of a waveform. The selected portion is in relief, acting as a button
you can move around to see different portions of the file in the first
three panes.
How to record a file:
Check File->Preferences
.... Be sure that every parameter is OK, particularly that the selected
audio input and outputs are those you want. Press the classic red key,
wait that the system start and speak (or play). When finished, press the
square button (as in any real or virtual recorder).
How to select a portion of
the waveform visible in the first three panes.
Click on the left point.
Shift-click on the right side point, or drag from the left point. The
selected zone will show a light yellow background.
How to make a zoom-in:
Menu View->Zoom to
selection.
How to adjust the
visibility of the spectrogram.
Two factors affect the spectrogram appearance. The windows
/ frame length, and the
contrast - brightness. As to the frame length, take into account that
the spectrum is discrete, showing a dot for every frequency multiple of
the inverse of the frame length. if you have into your signal a frequency
which exactly coincide with this grid, it will be showed as a dot at
maximum amplitude. If a frequency is in-between two adjacent points, it
will be seen as a (sampled) sync and the peak value will be pull-down
(you will find two peak values, for the sake of precision). Moreover, the
longer the frame, the better the frequency resolution, and the more
likely is that any actually present frequency comes close to an existing
point.
To modify the window length
and brightness / contrast, right-click on the pane (supposedly
Apple-click on the MacIntosh). This is the interesting two-columns menu
which will
appear:

Left:
Spectrum
Section: shows a window with power spectrum.
LTAS: shows a
window with the Long Term Average Spectrum.
Spectrogram
Controls ... Will show:

You can modify windows
length and brightness / contrast and see the effects in real time.
Rigth:
Create Pane, Delete Pane:
you can delete or add panes to your window.
Apply configuration /
Save configuration: You can save the configuration (pane choice) for
future reference, or apply an existing one.
Properties: Will
show / modify every parameters related to the pane on which you made
click.
Open Data File: You
can choose the file to show in the pane.
Save Data File: Save
the data showed in the pane into a file. Formants will go into a file
with suffix frm, pitch into a file with suffix f0. They are
all pure text ASCII files, one row per frame.
Files formats:
You can choose the column
separator going to the pane properties. Properties->Data Plot->Column delimiter
Pitch File (.f0):
Each
row is a frame (by default 10 msec). You
can change the frame length in Pane Properties ->Pitch Contour->Frame
interval.
If the algorithm used for
pitch tracking is ESPS (default), each row corresponds to a frame,
and has four columns: Pitch
in Hz, probability of voicing, means square local error of the pitch,
the normalized peak value of the cross correlation found when computing
the pitch.
Formats file (.frm):
Each row corresponds to a frame
(default 10 msec). You can change
the frame length in Pane Properties ->Formants->Frame
interval). One column per formant. Frequencies in Hz.
Critical
notes:
WSRF, when tracking formants,
is restrained to the detection of the mere frequencies. This is
probably all what you need in Linguistics. From the point of view of the timbre, we
know that more parameters are important and necessary: the peak gain
value, the Q, the presence of antiresonances (for nasality, but not only
for that). For instance, the box of many musical instruments (such as
violin, Alto, Cello ...) show a resonance-antiresonance couple at low
frequency (relatively to the tessitura of the instrument) due to the
resonance of the volume of the box. It is a behavior similar to Quartz
crystals, or to LRC circuits, that the currently used LPC analysis cannot
grab.
Please also note that in this
analysis of the Italian vowels /a/ /e/ /i/ /o/ /u/ pronounced by
myself (first sequence spoken, "vNI" in annotation pane.
Second sequence sung: "vI"). Let see the /o/ in both versions.
You can see a weak formant in between the second and the third ones, that
is not recognized as such, and the second formant goes in this case very
close to the first one (much like it happens for the /u/)
Attention! As
the automatic tracker can go wrong, WSRF allows the correction by
hand of the formant profile, simply drawing on the track with the mouse.
Do not use this pane to select zones, because you go to rewrite
formants.
Let observe also the
unintentional (or better, "spontaneous") coloratura
(ascending pitch at the beginning) in the sung vowels.

|