Reputation: 518
The method to perform character recognition for single digits its pretty easy. But this is when the image only contains ONE digit.
When the image contains multiple digits, we can't use the same algorithm since the entire bitmap is different. How do we process the image to split it, so we can "modularise" the OCR operation on each of the individual digits?
Upvotes: 5
Views: 976
Reputation: 3027
But what you want to perform is an image segmentation problem, not a digit classification problem. Just like @VitaliPro said. Both are OCR problems alright, but (in a huge simplification) the first problem is "what character is this" and the second is "how many characters I have here". You know how to solve the first problem already, left's look how the second is commonly solved.
You want to segment the image into characters (know as "regions" in segmentation), and then apply the digit classification to each region. One way to do it is to perform Watershed Segmentation, which uses a gradient of colours to distinguish edges and areas.
A simple watershed can be done with Python's numpy/scipy/skimage, for eaxmple:
#!/usr/bin/env python
from PIL import Image
import numpy as np
from scipy import ndimage
from skimage import morphology as morph
from skimage.filter import rank
def big_regions(lb, tot):
l = []
for i in range(1, tot+1):
l.append(((i == lb).sum(), i))
l.sort()
l.reverse()
return l
def segment(img, outimg):
img = np.array(Image.open(img))
den = rank.median(img, morph.disk(3))
# continuous regions (low gradient)
markers = rank.gradient(den, morph.disk(5)) < 10
mrk, tot = ndimage.label(markers)
grad = rank.gradient(den, morph.disk(2))
labels = morph.watershed(grad, mrk)
print 'Total regions:', tot
regs = big_regions(labels, tot)
There I'm using the watershed segmentation from the morph
module of skimage
.
Most of the time with watershed you ought to place the region on top of the image to get the actual content of the region, which I am not doing in the code above. Yet, that is not needed for digits or most text since it is expected to be black and white.
Watershed uses colour gradients to identify edges, but filters such a Canny or Sobel filter can as well be used. Note that I am performing the denoisation (slight blurring) of the image to prevent very small regions from being found, since those are most likely artifacts or noise. Using Canny or Sobel filters may require more denoisation steps since the filters result in crisp edges.
Segmentation is used for much more than character splitting, it is often used on images to distinguish important regions (i.e. big regions of very similar appearance). For example if I add some matplotlib
tot he above and change the segment function, say:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.cm as cm
def plot_seg(spr, spc, sps, img, cmap, alpha, xlabel):
plt.subplot(spr, spc, sps)
plt.imshow(img, cmap=cmap, interpolation='nearest', alpha=alpha)
plt.yticks([])
plt.xticks([])
plt.xlabel(xlabel)
def plot_mask(spr, spc, sps, reg, lb, regs, cmap, xlabel):
masked = np.ma.masked_array(lb, ~(lb == regs[reg][1]))
plot_seg(spr, spc, sps, masked, cmap, 1, xlabel)
def plot_crop(spr, spc, sps, reg, img, lb, regs, cmap):
masked = np.ma.masked_array(img, ~(lb == regs[reg][1]))
crop = masked[~np.all(masked == 0, axis=1), :]
crop = crop[:, ~np.all(crop == 0, axis=0)]
plot_seg(spr, spc, sps, crop, cmap, 1, '%i px' % regs[reg][0])
def segment(img, outimg):
img = np.array(Image.open(img))
den = rank.median(img, morph.disk(3))
# continuous regions (low gradient)
markers = rank.gradient(den, morph.disk(5)) < 10
mrk, tot = ndimage.label(markers)
grad = rank.gradient(den, morph.disk(2))
labels = morph.watershed(grad, mrk)
print 'Total regions:', tot
regs = big_regions(labels, tot)
spr = 3
spc = 6
plot_seg(spr, spc, 1, img, cm.gray, 1, 'image')
plot_seg(spr, spc, 2, den, cm.gray, 1, 'denoised')
plot_seg(spr, spc, 3, grad, cm.spectral, 1, 'gradient')
plot_seg(spr, spc, 4, mrk, cm.spectral, 1, 'markers')
plot_seg(spr, spc, 5, labels, cm.spectral, 1, 'regions\n%i' % tot)
plot_seg(spr, spc, 6, img, cm.gray, 1, 'composite')
plot_seg(spr, spc, 6, labels, cm.spectral, 0.7, 'composite')
plot_mask(spr, spc, 7, 0, labels, regs, cm.spectral, 'main region')
plot_mask(spr, spc, 8, 1, labels, regs, cm.spectral, '2nd region')
plot_mask(spr, spc, 9, 2, labels, regs, cm.spectral, '3rd region')
plot_mask(spr, spc, 10, 3, labels, regs, cm.spectral, '4th region')
plot_mask(spr, spc, 11, 4, labels, regs, cm.spectral, '5th region')
plot_mask(spr, spc, 12, 5, labels, regs, cm.spectral, '6th region')
plot_crop(spr, spc, 13, 0, img, labels, regs, cm.gray)
plot_crop(spr, spc, 14, 1, img, labels, regs, cm.gray)
plot_crop(spr, spc, 15, 2, img, labels, regs, cm.gray)
plot_crop(spr, spc, 16, 3, img, labels, regs, cm.gray)
plot_crop(spr, spc, 17, 4, img, labels, regs, cm.gray)
plot_crop(spr, spc, 18, 5, img, labels, regs, cm.gray)
plt.show()
(This sample does not run by itself, you need to add the other code sample above to the top of it.)
I can make quite a nice segemntation of any image, e.g. the result of the above:
The first row are the steps of the segmentation
function, in the second row you have the regions, and in the third you have the regions used as a mask on top of the image.
(P.S. Yes, the plot code is quite ugly, but it is easy to understand and change)
Upvotes: 1
Reputation: 2842
Follow the following steps:
Upvotes: 2