StuffHappens
StuffHappens

Reputation: 6557

Finding a fuzzy match with bitap algorithm

Reciently I've looked through several implementation of bitap algorithm but what all of them do is finding the beginning point of fuzzy match. What I need is to find a match. There's an example:

Say we have following text: abcdefg

and a pattern: bzde

and we want to find all occurence of a pattern in text with at most 1 error (Edit distance is concidered).

So I need that the algorithm returns: bcde.

Is there a simple (or not simple =) ) way to do it? The original artical about this algorithm doens't answer the question.

Thank you for your help.

Upvotes: 2

Views: 2455

Answers (1)

rsp
rsp

Reputation: 23383

For a simple start you could approach it with a series of regular expressions where in every expression you replace 1 character with the . wildcard. Combining those expressions into one using the ( | ) construct to create one big regular expression.

Another way would be to scan the string keeping an errorcount and incrementing the offset you match at when too many errors are encountered.

Upvotes: 1

Related Questions