Ryan Fernandes
Ryan Fernandes

Reputation: 8526

Levenshtein distance on non-English strings

Will the Levenshtein distance algorithm work well for non-English language strings too?

Update: Would this work automatically in a language like Java when comparing Asian characters?

Upvotes: 6

Views: 3428

Answers (3)

Dewfy
Dewfy

Reputation: 23624

Only if language is letter based. For example Russian, German,... but hieroglyph (China for example) or syllable (like Laos) - not.

Upvotes: 4

Select0r
Select0r

Reputation: 12638

Levenshtein doesn't care about languages, it just tells you how many characters need to be changed (added, removed, exchanged) to get from one string to the other.

So: yes, but you'll have to check your charset, some foreign "single" characters my otherwise be treated as two (or more) characters.

Upvotes: 1

ondra
ondra

Reputation: 9331

Yes. But you have to treat the non-english characters as "1 character", not as multiple characters (for example with utf-8). For example, in python you would use the unicode class to represent the string (and characters).

Upvotes: 3

Related Questions