jack
jack

Reputation: 17911

Automatic font recognition with Python

As you may have heard of, there is an online font recognition service call WhatTheFont

I'm curious about the tech behind this tool. I think basically we can seperate this into two parts:

  1. Generate images from font files of various format, refer to http://www.fileinfo.com/filetypes/font for a list of font file extensions.

  2. Compare submitted image with all generated images

I appreciate you share some advice or python code to implement two steps above.

Upvotes: 6

Views: 9505

Answers (3)

Rodrigo Laguna
Rodrigo Laguna

Reputation: 1850

This question is a little old, so here goes an updated answer.

You should take a look into this paper DeepFont: Identify Your Font from An Image. Basically it's a neural network trained on tons of images. It was presented commercially in this video.

Unfortunately, there is no code available. However, there is an independent implementation available here. You'll need to train it yourself, since weights are not provided, but the code is really easy to follow. In addition to this, consider that this implementation is only for a few fonts.

There is also a link to the dataset and a repo to generate more data.

Hope it helps.

Upvotes: 3

Steve Tjoa
Steve Tjoa

Reputation: 61064

I can't offer Python code, but here are two possible approaches.

  1. "Eigen-characters." In face recognition, given a large training set of normalized facial images, you can use principal component analysis (PCA) to obtain a set of "eigenfaces" which, when the training faces are projected upon this subspace, exhibit the greatest variance. The "coordinates" of the input test faces with respect to the space of eigenfaces can be used as the feature vector for classification. The same thing can be done with textual characters, i.e., many versions of the character 'A'.

  2. Dynamic Time Warping (DTW). This technique is sometimes used for handwriting character recognition. The idea is that the trajectory taken by the tip of a pencil (i.e., d/dx, d/dy) is similar for similar characters. DTW makes invariant some of the variations across instances of single person's writing. Similarly, the outline of a character can represent a trajectory. This trajectory then becomes the feature vector for each font set. I guess the DTW part is not as necessary with font recognition because a machine creates the characters, not a human. But it may still be useful to disambiguate spatial ambiguities.

Upvotes: 3

tom10
tom10

Reputation: 69212

As the OP states, there are two parts (and probably also a third part):

  1. Use PIL to generate images from fonts.

  2. Use an image analysis toolkit, like OpenCV (which has Python bindings) to compare different shapes. There are a variety of standard techniques to compare different objects to see whether they're similar. For example, scale invariant moments work fairly well and are part of the OpenCv toolkit.

  3. Most of the standard tools in #2 are designed to look for similar but not necessarily identical shapes, but for font comparison this might not be what you want, since the differences between fonts can be based on very fine details. For fine-detail analysis, try comparing the x and y profiles of a perimeter path around the each letter, appropriately normalized, of course. (This, or a more mathematically complicated variant of it, has been used with good success in font analysis.)

Upvotes: 5

Related Questions