Baz
Baz

Reputation: 13135

Extracting pitch from singing voice

I'd like to extract the pitch from a singing voice. The track in question contains only a single voice and no other sounds.

I want to know the loudness and perceived pitch frequency at a given point in time. So something like the following:

0.0sec 400Hz -20dB
0.1sec 401Hz -9dB
0.2sec 403Hz -10dB
0.3sec 403Hz -10dB
0.4sec 404Hz -11dB
0.5sec 406Hz -13dB
0.6sec 410Hz -15dB
0.7sec 411Hz -16dB
0.8sec 409Hz -20dB
0.9sec 407Hz -24dB
1.0sec 402Hz -34dB

How might I achieve such an output? I'm interested in slight changes in frequency as apposed to a specific note value. I have some DSP knowledge and I can program in C++ and python but I'd like to avoid reinventing the wheel if possible.

Upvotes: 0

Views: 2115

Answers (2)

user1044532
user1044532

Reputation:

I suggest you read this article http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf . This is one of the simplest methods of pitch detection, and it works very well. Also, for measuring the instantaneous power of the signal, you can just take the absolute value of the signal and divide by 1/√2 (Gives the RMS value) and then smooth it (usually a first order low pass filter). I hope this helps. Good luck!

Upvotes: 1

hotpaw2
hotpaw2

Reputation: 70693

Note that slight changes in frequency in Hz and perceived pitch may not be the same thing. Perceived pitch resolution seems to vary with absolute frequency, duration, and loudness. If you want more accuracy than this, there might be some research papers on estimating the time between each glottal closure (probably using a deconvolution or pattern matching technique), which would give you some sort of pitch period. The simplest pitch estimate might be some form of weighted autocorrelation, for which lots of canned algorithms and code is available.

Since dB is log scale, this measure might be somewhat closer to perceived loudness, but has to be spectrally weighted with some perceptual frequency response curve over some duration of measurement.

There seem to be research papers on both of these topics, as well as many textbooks on human audio perception as well as on common audio DSP techniques.

Upvotes: 1

Related Questions