Reputation: 51
I need to compare 2 sequences and find an edit distance. Edits can include deletion and insertion operations (with modification weight 1 per symbol), and block move operations (with weight 0.1 per symbol)
For example:
A B C D E F G H
F G H A B C Y D X E
Block FGH was moved here.
Is there any existing algorithm to solve this task efficiently?
Upvotes: 1
Views: 275
Reputation: 28004
You might try A technique for isolating differences between files (via here):
An algorithm which uses the 'move' operator is described in P. Heckel's 1978 paper
(Sorry for the scribd interface, but I guess the paper has not been OCR'd.)
Upvotes: 2
Reputation: 51246
The abstract for Block Edit Models for Approximate String Matching looks promising.
Upvotes: 0
Reputation: 47944
Yes; there are many algorithms and theory relating to biology; genome alignment and chromosome rearrangement. Without knowing your data, it's really hard to mention something more specific. I mention pancake sorting as a measure of rearrangement on another stackoverflow post, there are also a few other great options (compression, specifically). Of course that method will not be able to break apart your data into blocks. Dealing with small sequence data you should have no problem generating all groupings.
Upvotes: 0