MaiaVictor
MaiaVictor

Reputation: 53017

Comparing strings for their similarities?

I want to count the number of times there is an ocurrence of certain college course on a list of thousands of entries. The problem is the course is not always spelled the same. For example, Computer Engineering can be spelled Computers Engineering. What is a proper, elegant way to test if 2 strings are very similar?

Upvotes: 0

Views: 109

Answers (1)

amit
amit

Reputation: 178481

I would try to canonize the strings using stemming. The idea is - give each string its canonized form, and two different strings, that represent the same word are very likely to have the same canon form (for example, Computer and Computers will have the same cannon form, and you will get a match).

Porter stemming algorithm is often used for canonization.


An alternative - is grading the strings with a distance between each other, the suggested Levenshtein Distance can help you with it, but personally - I'd prefer canonization.

Upvotes: 2

Related Questions