Blender
Blender

Reputation: 298176

Accurate binary image classification

I'm trying to extract letters from a game board for a project. Currently, I can detect the game board, segment it into the individual squares and extract images of every square.

The input I'm getting is like this (these are individual letters):

enter image description hereenter image description hereenter image description hereenter image description hereenter image description hereenter image description here

At first, I was counting the number of black pixels per image and using that as a way of identifying the different letters, which worked somewhat well for controlled input images. The problem I have, though, is that I can't make this work for images that differ slightly from these.

I have around 5 samples of each letter to work with for training, which should be good enough.

Does anybody know what would be a good algorithm to use for this?

My ideas were (after normalizing the image):

Any help would be appreciated!

Upvotes: 14

Views: 5061

Answers (7)

CopyPasteIt
CopyPasteIt

Reputation: 574

Since your images are coming off a computer screen of a a board game, the variation can't be 'too crazy'. I just got something working for the same type of problem. I normalized my images by cropping right down to the 'core'.

With 5 samples per letter, you might already have complete coverage.

I organized my work by 'stamping' the identifier at the start of the image filename. I then could sort on the filename (=identifier). Windows Explorer allows you to view the directory with Medium Icons turned on. I would get the identifier by a 'fake-rename' action and copy it into the Python program.

Here is some working code that can be revamped for any of these problems.

def getLetter(im):
    area = im.height * im.width
    white_area = np.sum(np.array(im))
    black_area = area - white_area
    black_ratio = black_area / area           # between 0 and 1
    if black_ratio == .740740740740740 or \
       black_ratio == .688034188034188 or \
       black_ratio == .7407407407407407:  
       return 'A'
    if black_ratio == .797979797979798:
       return 'T'
    if black_ratio == .803030303030303:
       return 'I'
    if black_ratio == .5050505050505051 or \
       black_ratio == .5555555555555556:
       return 'H'
    ############ ... etc.

    return '@' # when this comes out you have some more work to do

Note: It is possible that the same identifier (here we are using black_ratio) might point to more than one letter. If it happens, you'll need to take another attribute of the image to discriminate between them.

Upvotes: 0

sj7
sj7

Reputation: 1321

You can try building a model by uploading your training data (~50 images of 1s,2s,3s....9s) to demo.nanonets.ai (free to use)

1) Upload your training data here:

demo.nanonets.ai

2) Then query the API using the following (Python Code):

import requests
import json
import urllib
model_name = "Enter-Your-Model-Name-Here"
url = "http://images.clipartpanda.com/number-one-clipart-847-blue-number-one-clip-art.png"
files = {'uploadfile': urllib.urlopen(url).read()}
url = "http://demo.nanonets.ai/classify/?appId="+model_name
r = requests.post(url, files=files)
print json.loads(r.content)

3) the response looks like:

{
  "message": "Model trained",
  "result": [
    {
      "label": "1",
      "probability": 0.95
    },
    {
      "label": "2",
      "probability": 0.01
    },

     ....

    {
      "label": "9",
      "probability": 0.005
    }
  ]
}

Upvotes: 1

moooeeeep
moooeeeep

Reputation: 32512

I think this is some sort of Supervised Learning. You need to do some feature extraction on the images and then do your classification on the basis of the feature vector you've computed for each image.

Feature Extraction

On the first sight, that Feature Extraction part looks like a good scenario for Hu-Moments. Just calculate the image moments, then compute cv::HuMoments from these. Then you have a 7 dimensional real valued feature space (one feature vector per image). Alternatively, you could omit this step and use each pixel value as seperate feature. I think the suggestion in this answer goes in this direction, but adds a PCA compression to reduce the dimensionality of the feature space.

Classification

As for the classification part, you can use almost any classification algorithm you like. You could use an SVM for each letter (binary yes-no classification), you could use a NaiveBayes (what is the maximal likely letter), or you could use a k-NearestNeighbor (kNN, minimum spatial distance in feature space) approach, e.g. flann.

Especially for distance-based classifiers (e.g. kNN) you should consider a normalization of your feature space (e.g. scale all dimension values to a certain range for euclidean distance, or use things like mahalanobis distance). This is to avoid overrepresenting features with large value differences in the classification process.

Evaluation

Of course you need training data, that is images' feature vectors given the correct letter. And a process, to evaluate your process, e.g. cross validation.


In this case, you might also want to have a look at template matching. In this case you would convolute the candidate image with the available patterns in your training set. High values in the output image indicate a good probability that the pattern is located at that position.

Upvotes: 14

Abid Rahman K
Abid Rahman K

Reputation: 52646

I had a similar problem few days back. But it was digit recognition. Not for alphabets.

And i implemented a simple OCR for this using kNearestNeighbour in OpenCV.

Below is the link and code :

Simple Digit Recognition OCR in OpenCV-Python

Implement it for alphabets. Hopes it works.

Upvotes: 3

Sam
Sam

Reputation: 20058

Please look at these two answers related to OCR

Scoreboard digit recognition using OpenCV

and here

OCR of low-resolution text from screenshots

Upvotes: 0

karlphillip
karlphillip

Reputation: 93410

Upvotes: 4

Chris Eberle
Chris Eberle

Reputation: 48775

This is a recognition problem. I'd personally use a combination of PCA and a machine learning technique (likely SVM). These are fairly large topics so I'm afraid I can't really elaborate too much, but here's the very basic process:

  1. Gather your training images (more than one per letter, but don't go crazy)
  2. Label them (could mean a lot of things, in this case it means group the letters into logical groups -- All A images -> 1, All B images -> 2, etc.)
  3. Train your classifier
    • Run everything through PCA decomposition
    • Project all of your training images into PCA space
    • Run the projected images through an SVM (if it's a one-class classifier, do them one at a time, otherwise do them all at once.)
    • Save off your PCA eigenvector and SVM training data
  4. Run recognition
    • Load in your PCA space
    • Load in your SVM training data
    • For each new image, project it into PCA space and ask your SVM to classify it.
    • If you get an answer (a number) map it back to a letter (1 -> A, 2 -> B, etc).

Upvotes: 5

Related Questions