Reputation: 21
I found this speech recognition code that I downloaded from a blog. It works fine, it asks to record sounds to create a dataset and then you have to call a function to train the system using neural networks.
I want to use this code to train using my dataset of 20 words that I want to recognise.
Problem: I have a dataset of 800 files for twenty words i.e. 40 recordings from different people for each word. I used Windows sound recorder to collect the files. The problem is that in the code is that the size of the input file is set to ALWAYS be 8000, my dataset on the other hand is not constant, some files are 2 seconds long, some are 3 that means there'll be different number of samples in each file.
If the samples per input signal variate it'll probably generate errors. I want to use my files to train the system. How do I do that?
Code:
clc;clear all;
load('voicetrainfinal.mat');
Fs=8000;
for l=1:20
clear y1 y2 y3;
display('record voice');
pause();
x=wavrecord(Fs,Fs); % wavrecord(n,Fs) records n samples at a sampling rate of Fs
maxval = max(x);
if maxval<0.04
display('Threshold value is too large!');
end
t=0.04;
j=1;
for i=1:8000
if(abs(x(i))>t)
y1(j)=x(i);
j=j+1;
end
end
y2=y1/(max(abs(y1)));
y3=[y2,zeros(1,3120-length(y2))];
y=filter([1 -0.9],1,y3');%high pass filter to boost the high frequency components
%%frame blocking
blocklen=240;%30ms block
overlap=80;
block(1,:)=y(1:240);
for i=1:18
block(i+1,:)=y(i*160:(i*160+blocklen-1));
end
w=hamming(blocklen);
for i=1:19
a=xcorr((block(i,:).*w'),12);%finding auto correlation from lag -12 to 12
for j=1:12
auto(j,:)=fliplr(a(j+1:j+12));%forming autocorrelation matrix from lag 0 to 11
end
z=fliplr(a(1:12));%forming a column matrix of autocorrelations for lags 1 to 12
alpha=pinv(auto)*z';
lpc(:,i)=alpha;
end
wavplay(x,Fs);
X1=reshape(lpc,1,228);
a1=sigmoid(Theta1*[1;X1']);
h=sigmoid(Theta2*[1;a1]);
m=max(h);
p1=find(h==m);
if(p1==10)
P=0
else
P=p1
end
end
Upvotes: 1
Views: 842
Reputation: 11
n=noOfFiles
for k=1:n
M(k,1:length(filedata{k})) = filedata{k}
end
:P
Upvotes: 1
Reputation: 471
In your code you have:
Fs=8000;
wavrecord(n,Fs) % records n samples at a sampling rate Fs
for i=1:8000
if(abs(x(i))>t)
y1(j)=x(i);
j=j+1;
end
end
It seems that instead of recording you are going to import your sound file (here for a .wave file):
[y, Fs] = wavread(filename);
Instead of hardcoding the 8000
value you can read the length of your file:
n = length(y);
and then just use that n
variable in the for
loop:
for i=1:n
if(abs(x(i))>t)
y1(j)=x(i);
j=j+1;
end
end
The rest of the code seems to be independent of that 8000
value.
If you are worried that having non-constant file length. Compute n_max
, the maximum length of all the audio recordings you have. And for recording shorter than n_max
samples pad them with zeros so as to make them all n_max
long.
n_max = 0;
for file = ["file1" "file2" ... "filen"]
[y, Fs] = wavread(filename);
n_max = max(n_max,length(y));
end
Then each time you process a sound vector you can pad it with 0 (harmless for you, because 0 means no sound) like so:
y = [y, zeros(1, n_max - length(y))];
Upvotes: 1