Zack Burt
Zack Burt

Reputation: 8455

Write a Python method to generate typos based on a string

I could just add in something that creates typos based on Levenshtein distance of two, or something like that, or reverse-engineer Norvig's article on spellchecking.

However, what are the most common ways to typos?

Has somebody written a method?

Upvotes: 3

Views: 2499

Answers (1)

plaes
plaes

Reputation: 32716

There's no such thing as general typo generation algorithm because this kind of algorithm depends on the target language and application - ie to generate spam domains you basically need to apply following strategies (using meta.stackoverflow.com as an example):

  1. missing dots: met*as*tackoverflow.com (should be easy ;)
  2. character insertion: meta.stackoverflo*ww*.com (just add a dupe of every character)
  3. character omission: meta.stackoverf*lw*.com (just drop a character)
  4. character permutation: meta.stackove*fr*low.com (pure mathematics here)
  5. character replacement: meta.*d*tackoverflow.com (now here we can have at least two strategies, see below)

In case of character replacement we can have at least two scenarios:

  1. Similar sounding letters (ie c <-> k, z <-> ts ) depending on language
  2. Nearby letters proximity typos (ie for qwerty s <-> d, d <-> f) Duh, I actually made a typo here with s <-> d case :)

Hope this helps..

Upvotes: 2

Related Questions