Reputation: 1098
I am trying to determine the perceived pitch of an audio sample (voice only, no background or music) to then identify the voice as bass, tenor, alto, mezzo-soprano, soprano.
To do so, I use aubio which returns a list of timecodes and the respective frequency of any given audio file.
I struggle to find the best way how to use the data to determine the pitch. My initial idea is either simply not good or badly executed:
exec('aubiopitch /pathtomp3file/audio.mp3',$output);
// iterate through the time/frequencies returned by aubio
// $output is a list of number pairs (one pair per line):
// The timecode followed by a whitespace followed by the frequency
// at that timecode in hertz.
foreach($output as $sample) {
// extract frequency information
$freq_sample=substr($sample,strpos($sample,' '));
// add frequency to array
$freqs[]=floor($freq_sample);
}
// to calculate median frequency: sort array with frequencies
// and fetch the element in the middle
sort($freqs);
$median=$freqs[floor(count($freqs)/2)];
Unfortunately, the results are inconsistent. Too many times, the median frequency of a very deep voice, for example, comes out way too high.
I believe the way I try to determine the fundamental frequency has a flaw but I struggle to come up with a better approach.
The following questions arise, for example:
Should I discard any frequencies above, for example, 400hz, as they are probably from sounds like "s", etc.?
When humans perceive the pitch of a voice, what is that we actually listening for? The fundamental frequency? The energy of certain frequencies?
The overall question that sums it up would be:
"Using aubio's data, what is the correct programmatical approach to calculate the perceived pitch of voice recording (talking, not singing)?"
EDIT – HOW I USE AUBIO
exec('aubiopitch /pathtomp3file/audio.mp3',$output);
Upvotes: 2
Views: 181