Efficient Pattern Matching / String Merging Algorithm

I'm looking for an Algorithm (Preferably with a java implementation) for merging Strings.

my problem is as following :

suppose I have an Array/List of Strings {"myString1" , "my String1" , "my-String-1" ... } I'd like the algorithm to point out that there is a very high probability that all of these values denote the "myString1".

so I would like to compact my list. maybe this can be done with KMP or maybe there is something more suitable.

Thanks.

Upvotes: 0

Views: 749

Answers (2)

barak1412
barak1412

Reputation: 1160

I think that Edit distance is good heuristic for merging strings.

EDIT:

You can modify the edit distance algorithm:

You can give different value for d(-,c) for character c.

So in the following example: "String1","String2", you can "punish" the score but letting d(1,2) be high, in contrast to "String 1","String1" that won't be punished because the score will be d(-,' ').

Upvotes: 1

Roman Saveljev
Roman Saveljev

Reputation: 2594

Alternatively, Approximate string matching could be of some use. I dont believe KMP would suit the purpose, because it is designed for precise substring matching

Upvotes: 0

Related Questions