Reputation: 361
I used Harmonic Product Spectrum to find the fundamental note present when a number of harmonics are present. This is the code I've implemented;
[song,FS] = wavread('C major.wav');
%sound(song,FS);
P = 20000;
N=length(song); % length of song
t=0:1/FS:(N-1)/FS; % define time period
song = sum(song,2);
song=abs(song);
%----------------------Finding the envelope of the signal-----------------%
% Gaussian Filter
w = linspace( -1, 1, P); % create a vector of P values between -1 and 1 inclusive
sigma = 0.335; % standard deviation used in Gaussian formula
myFilter = -w .* exp( -(w.^2)/(2*sigma.^2)); % compute first derivative, but leave constants out
myFilter = myFilter / sum( abs( myFilter ) ); % normalize
% fft convolution
myFilter = myFilter(:); % create a column vector
song(length(song)+length(myFilter)-1) = 0; %zero pad song
myFilter(length(song)) = 0; %zero pad myFilter
edges =ifft(fft(song).*fft(myFilter));
tedges=edges(P:N+P-1); % shift by P/2 so peaks line up w/ edges
tedges=tedges/max(abs(tedges)); % normalize
%---------------------------Onset Detection-------------------------------%
% Finding peaks
maxtab = [];
mintab = [];
x = (1:length(tedges));
min1 = Inf;
max1 = -Inf;
min_pos = NaN;
max_pos = NaN;
lookformax = 1;
for i=1:length(tedges)
peak = tedges(i:i);
if peak > max1,
max1 = peak;
max_pos = x(i);
end
if peak < min1,
min1 = peak;
min_pos = x(i);
end
if lookformax
if peak < max1-0.07
maxtab = [maxtab ; max_pos max1];
min1 = peak;
min_pos = x(i);
lookformax = 0;
end
else
if peak > min1+0.08
mintab = [mintab ; min_pos min1];
max1 = peak;
max_pos = x(i);
lookformax = 1;
end
end
end
max_col = maxtab(:,1);
peaks_det = max_col/FS;
No_of_peaks = length(peaks_det);
[song,FS] = wavread('C major.wav');
song = sum(song,2);
%---------------------------Performing STFT--------------------------------%
h = 1;
%for i = 2:No_of_peaks
song_seg = song(max_col(7-1):max_col(7)-1);
L = length(song_seg);
NFFT = 2^nextpow2(L); % Next power of 2 from length of y
seg_fft = fft(song_seg,NFFT);%/L;
f = FS/2*linspace(0,1,NFFT/2+1);
seg_fft_2 = 2*abs(seg_fft(1:NFFT/2+1));
L5 = length(song_seg);
figure(6)
plot(f,seg_fft_2)
%plot(1:L/2,seg_fft(1:L/2))
title('Frequency spectrum of signal (seg_fft)')
xlabel('Frequency (Hz)')
xlim([0 2500])
ylabel('|Y(f)|')
ylim([0 500])
%----------------Performing Harmonic Product Spectrum---------------------%
% In harmonic prodcut spectrum, you downsample the fft data several times and multiply all those with the original fft data to get the maximum peak.
%HPS
seg_fft = seg_fft(1 : size(seg_fft,1)/2 );
seg_fft = abs(seg_fft);
a = length(seg_fft);
seg_fft2 = ones(size(seg_fft));
seg_fft3 = ones(size(seg_fft));
seg_fft4 = ones(size(seg_fft));
seg_fft5 = ones(size(seg_fft));
for i = 1:((length(seg_fft)-1)/2)
seg_fft2(i,1) = seg_fft(2*i,1);%(seg_fft(2*i,1) + seg_fft((2*i)+1,1))/2;
end
%b= size(seg_fft2)
L1 = length(seg_fft2);
NFFT1 = 2^nextpow2(L1); % Next power of 2 from length of y
f1 = FS/2*linspace(0,1,NFFT1/2+1);
seg_fft12 = 2*abs(seg_fft2(1:NFFT1/2+1));
figure(7);
plot(f1,seg_fft12)
title('Frequency spectrum of signal (seg_fft2)')
xlabel('Frequency (Hz)')
xlim([0 2500])
ylabel('|Y(f)|')
ylim([0 500])
This is the plot for Figure 6
So in the real scenario once I perform HPS (downsample by 2) the peak at 440.1 should shift down to 220 while the peak at 881 should shift down to around 440. But when I plot the graph that is not what I get. Insetad this is the graph I get,
Why don't I get the correct graph???? I don't seem to understand what Im doing wrong here... Can someone please have a look and let me know.. Thank you.....
Upvotes: 0
Views: 2666
Reputation: 46375
The problem with your downsampling is that you trim the vector by 2x before doing the downsampling, instead of after. You do
seg_fft = seg_fft(1 : size(seg_fft,1)/2 );
% [... other stuff ...]
for i = 1:((length(seg_fft)-1)/2)
seg_fft2(i,1) = seg_fft(2*i,1);%(seg_fft(2*i,1) + seg_fft((2*i)+1,1))/2;
end
Instead you need to downsample first, then trim:
for i = 1:((length(seg_fft)-1)/2)
seg_fft2(i,1) = seg_fft(2*i,1);%(seg_fft(2*i,1) + seg_fft((2*i)+1,1))/2;
end
seg_fft = seg_fft(1 : size(seg_fft,1)/2 );
UPDATE you asked why this does not preserve the peaks. The short answer is that you may not be "looking at" the peaks. If you want to keep (nearest) peaks during downsampling by n
, you can do the following:
n = 3; % degree of decimation or downsampling we want to do
N = size(seg_fft, 1); % number of samples in original FFT
Nn = n * floor(N/n); % number of samples that can be divided by n
fftBlock = reshape(seg_fft(1:Nn, 1), n, N);
fftResampled = max(fftBlock);
How does this work? Let's use a simple example of 10 x 1 points:
seg_fft = [0 1 10 5 4 3 6 12 4 3];
We want "every 3rd". Naive algorithm would give
fftResampled = [2 3 7];
But we would have liked the "peaks" [10 3 12]
- unfortunately they are not in the right location.
After reshaping the array (and losing the last element; if it might be an interesting value we could append and pad with zeros) we get:
fftBlock = [0 5 6;
1 4 3;
10 3 4];
(Remember that Matlab matrices are rows-first)
Now taking the max
(the function will operate along the first dimension unless we tell it otherwise) you get
fftResampled = [10 5 6];
i.e. always the peak. While this preserves the peaks, it does mean that your "valleys" are getting filled in a bit.
Bottom line: there is no way not to destroy "some" information in the process of downsampling - you are, after all, throwing half the samples away. What you keep, and how you account for the information content in the data you discard, is something only you can decide, as it will depend on your application, and what is important for you.
Upvotes: 1