Shourov Foisal
Shourov Foisal

Reputation: 35

Automatically remove straight lines with Hough transform

I am doing a thesis on optical character recognition. My job is to properly segment text characters from images.

Problem is, every text line in this language has words in which often characters are connected by straight lines. These lines may or may not be of equal thickness.

So far using projection profile, I have been able to segment characters that are not attached to any straight lines. But to segment characters that are connected by straight lines, I have to remove those lines. I prefer to use Hough transform to detect and remove those lines (meaning in a BW image, if a pixel in the line is black, then make it white).

See a sample image containing text: Sample Image

This is a line segmented from the above image using projection profile.

And These are the detected lines using Hough Transform.

Code for Hough transformation. Use This image to test it.

I = imread('line0.jpg');
%I = rgb2gray(I);
BW = edge(I,'canny');
[H,T,R] = hough(BW);
imshow(H,[],'XData',T,'YData',R,'InitialMagnification','fit');
xlabel('\theta'),ylabel('\rho');
axis on, axis normal, hold on;
P = houghpeaks(H,1,'threshold',ceil(0.3*max(H(:))));
x = T(P(:,2));
y = R(P(:,1));
plot(x,y,'s','color','blue');

% Find lines and plot them
lines = houghlines(BW,T,R,P,'FillGap',5,'MinLength',7);
figure, imshow(I), hold on
grid on
max_len = 0;

for k = 1:length(lines)
    xy = [lines(k).point1;lines(k).point2];
    plot(xy(:,1),xy(:,2),'LineWidth',1,'Color','green');

    % plot beginnings and ends of lines
    plot(xy(1,1),xy(1,2),'o','LineWidth',2,'Color','red');
    plot(xy(2,1),xy(2,2),'o','LineWidth',2,'Color','blue');

    % determine the endpoints of the longest line segment
    len = norm(lines(k).point1 - lines(k).point2);
    if( len > max_len )
        max_len = len;
        xy_long = xy;
    end
end

Any ideas on how I can do it? Any help will be appreciated!

Upvotes: 2

Views: 833

Answers (1)

Ash
Ash

Reputation: 328

From houghlines you just need to replace the indices of the line with white (255 in this case). You might have to play around with the padding a bit, to take off one or two more pixels.

EDIT: Here is a version attempts to determine the padding.

%% OCR
I = imread('CEBML.jpg');
BW = edge(I,'canny');
[H,T,R] = hough(BW);
P = houghpeaks(H,1,'threshold',ceil(0.3*max(H(:))));
x = T(P(:,2));
y = R(P(:,1));

% Find lines and plot them
lines = houghlines(BW,T,R,P,'FillGap',5,'MinLength',7);
subplot(2,1,1)
grid on
imshow(I)
title('Input')
hold on
px = 5; % Number of padding pixels to probe
white_threshold = 30; % White threshold
ln_length = .6; % 60 %
for k = 1:length(lines)
    xy = [lines(k).point1; lines(k).point2];
    buf_y = xy(1,1):xy(2,1); % Assuming it's a straight line!
    buf_x = [repmat(xy(1,2),1,xy(2,1) - xy(1,1)),xy(2,2)] +  [-px:px]';
    I_idx = sub2ind(size(I),buf_x, repmat(buf_y,size(buf_x,1),1));
    % Consider lines that are below white threshold, and are longer than xx
    % of the found line.
    idx = sum(I(I_idx) <= white_threshold,2) >= ln_length * size(I_idx,2);
    I(I_idx(idx,:)) = 255;

    % Some visualisation 
    [ixx,jyy] = ind2sub(size(I),I_idx(idx,:));
    plot(jyy,ixx,'.r');% Pixels set to white 
    plot(xy(:,1),xy(:,2),'-b','LineWidth',2);  % Found lines
end
subplot(2,1,2)
grid on
imshow(I)
title('Output')

enter image description here

Upvotes: 1

Related Questions