Reputation: 77
I'm trying to read the values in this image into variables using OCR in MATLAB. I'm having trouble doing so, so I tried to split up this image into smaller parts using the white boundary lines then trying to read it, but I dont know how to do this. Any help would be appreciated, thanks.
Upvotes: 1
Views: 823
Reputation: 39439
There is now a built-in ocr
function in the Computer Vision System Toolbox.
Upvotes: 0
Reputation: 221774
You can sum
along the columns of a binary image corresponding to that input image and find peaks from the sum
values. This is precisely achieved in the code here -
img = imread('http://puu.sh/cU3Nj/b020b60f0b.png');
BW = im2bw(img,0.1); %// convert to a binary image with a low threshold
peak_sum_max = 30; %// max of sum of cols to act as threshold to decide as peak
peaks_min_width = 10; %// min distance between peaks i.e. min width of each part
idx = find( sum(BW,1)>=peak_sum_max );
split_idx = [1 idx( [true diff(idx)>peaks_min_width ] )];
split_imgs = arrayfun(@(x) img(:,split_idx(x):split_idx(x+1)),...
1:numel(split_idx)-1,'Uni',0);
%// Display split images
for iter = 1:numel(split_imgs)
figure,imshow(split_imgs{iter})
end
Please note that the final output split_imgs
is a cell array with each cell holding image data for each split image.
If you would like to have the split images directly without the need for messing with cell arrays, after you have split_idx
, you can do this -
%// Get and display split images
for iter = 1:numel(split_idx)-1
split_img = img(:,split_idx(iter):split_idx(iter+1));
figure,imshow(split_img)
end
Upvotes: 1
Reputation: 651
If the blocks are always delimited by a completely vertical line, you can find where they are by comparing the original image (here transformed from RGB to grayscale to be a single plane) to a matrix that is made of repeats of the first row of the original image only. Since the lines are vertical the intensity of the pixels in the first line will be the same throughout. This generates a binary mask that can be used in conjunction with a quick thresholding to reject those lines that are all black pixels in every row. Then invert this mask and use regionprops
to locate the bounding box of each region. Then you can pull these out and do what you like.
If the lines dividing the blocks of text are not always vertical or constant intensity throughout then there's a bit more work that needs to be done to locate the dividing lines, but nothing that's impossible. Some example data would be good to have in that case, though.
img = imread('http://puu.sh/cU3Nj/b020b60f0b.png');
imshow(img);
imgGray = rgb2gray(img);
imgMatch = imgGray == repmat(imgGray(1,:), size(imgGray, 1), 1);
whiteLines = imgMatch & (imgGray > 0);
boxes = regionprops(~whiteLines, 'BoundingBox');
for k = 1:6
subplot(3,2,k)
boxHere = round(boxes(k).BoundingBox);
imshow(img(boxHere(2):(boxHere(2)+boxHere(4)-1), boxHere(1):(boxHere(1)+boxHere(3)-1), :));
end
Upvotes: 1