Ana Ain
Ana Ain

Reputation: 173

How to improve the OCR accuracy rate of Neural Network in Matlab

I'm working on OCR for Arabic character. I want to try glcm as a features extraction method. I've got the code here: http://www.mathworks.com/matlabcentral/fileexchange/22187-glcm-texture-features

Example of input images (character images):

enter image description here enter image description here enter image description here

and I've made a code to get the GLCM output based on needed features. Here it is:

function features = EkstraksiFitur_GLCM(x)
    glcm = graycomatrix(x,'offset',[0 1; -1 1; -1 0; -1 -1], 'NumLevels', 2); 

    stats = GLCM_Features1(glcm, 0);
    autocorrelation = double(mean (stats.autoc));
    if isnan(autocorrelation)
        autocorrelation=0;
    else
        autocorrelation=autocorrelation;
    end

    contrast = double(mean(stats.contr));
    if isnan(contrast)
        contrast=0;
    else
        contrast=contrast;
    end

    Correlation = double(mean (stats.corrm));
    if isnan(Correlation)
        Correlation=0;
    else
        Correlation=Correlation;
    end

    ClusterProminence = double(mean (stats.cprom));
    if isnan(ClusterProminence)
        ClusterProminence=0;
    else
        ClusterProminence=ClusterProminence;
    end

    ClusterShade = double(mean (stats.cshad));
    if isnan(ClusterShade)
        ClusterShade=0;
    else
        ClusterShade=ClusterShade;
    end

    Dissimilarity = double(mean (stats.dissi));
    if isnan(Dissimilarity)
        Dissimilarity=0;
    else
        Dissimilarity=Dissimilarity;
    end

    Energy = double(mean (stats.energ));
    if isnan(Energy)
        Energy=0;
    else
        Energy=Energy;
    end
    . 
    .
    .
    features=[autocorrelation, contrast, Correlation, Dissimilarity, Energy, Entropy, Homogeneity, MaximumProbability, SumAverage, SumVariance, SumEntropy, DifferenceVariance, DifferenceEntropy, InverseDifferenceMomentNormalized];

Using loop to get the features of all the images (data train):

srcFile = dir('D:\1. Thesis FINISH!!!\Data set\0 Well Segmented Character\Advertising Bold 24\datatrain\*.png');
fetrain = [];
for a = 1:length(srcFile)
    file_name = strcat('D:\1. Thesis FINISH!!!\Data set\0 Well Segmented Character\Advertising Bold 24\datatrain\',srcFile(b).name);
    A = imread(file_name);
    [gl] = EkstraksiFitur_GLCM2 (A);
    [fiturtrain] = reshape (gl, [56,1]) ;
    fetrain = [fetrain fiturtrain];
%   vectorname = strcat(file_name,'_array.mat');

end
 save ('fetrain.mat','fetrain');

I've got the features.

enter image description here

And then run the training process using Neural Network, but I get a very low accuracy rate. This is the code:

% clc;clear;close all;
% function net1 = pelatihan (input, target)
net = newff(fetrain,target,[10 2],{'tansig','tansig'},'trainscg');
% net.trainParam.mem_reduc = 2;
net.performFcn = 'mse'; 
net.divideFcn = 'dividetrain';
% [trainInd,valInd,testInd] = dividetrain(601);
net.trainParam.show = 10; % Frequency of progress displays (in epochs).
net.trainParam.epochs = 1000; %default 1000
net.trainParam.goal = 1e-6;
net = train(net,fetrain,target);
output = round(sim(net,fetrain));
save net1.mat net
% net2 = output;
data = fetest;

[target; output];
prediksi = round(sim (net, data));
[targetx; prediksi];

%% Calculate the accuracy %
y = 1;
j = size (prediksi, 2); 
% x = size (targetx, 2);
for i = 1:j 
    if prediksi (i) == targetx (i)
       y =y+1;
    else
        y;
    end 
end 
% y all correct data
% j all data
s = 'The accuracy is %.2f%%';
acc = 100 *(y/j);
sprintf (s,acc)

I've tried several times, but the accuracy rate (NN test result) wasn't improve. It's contantly give output 1.96%. Is there something wrong with the process flow, or with the code that i've made?

Any help would be very helpful and appreciated

Upvotes: 0

Views: 507

Answers (1)

Feras
Feras

Reputation: 843

First I can see from the feature you extracted that they are nnot normalized and they vary in range. which means some of the fetaure wil dominate the rest. try to normalize or standarize the features. is the accuracy you measure on training set only or you are some test set or cross validation methods? is it true what I see you are using 601 features? did you try features selection methods to decide which features belong better to the data and the model?

Second I would like to know what you are implementing for the structure instead of reading the full code to understand what you have done.

third would be intersting to look at the input image to understand the enviremoent you are dealing with.

Upvotes: 1

Related Questions