Reputation: 51

sequence comparison with moved blocks detection

I need to compare 2 sequences and find an edit distance. Edits can include deletion and insertion operations (with modification weight 1 per symbol), and block move operations (with weight 0.1 per symbol)
For example:
A B C D E F G H
F G H A B C Y D X E
Block FGH was moved here.
Is there any existing algorithm to solve this task efficiently?

Upvotes: 1

Answers (3)

LarsH

Reputation: 28004

You might try A technique for isolating differences between files (via here):

An algorithm which uses the 'move' operator is described in P. Heckel's 1978 paper

(Sorry for the scribd interface, but I guess the paper has not been OCR'd.)

Upvotes: 2

j_random_hacker

Reputation: 51246

The abstract for Block Edit Models for Approximate String Matching looks promising.

Upvotes: 0

nlucaroni

Reputation: 47944

Yes; there are many algorithms and theory relating to biology; genome alignment and chromosome rearrangement. Without knowing your data, it's really hard to mention something more specific. I mention pancake sorting as a measure of rearrangement on another stackoverflow post, there are also a few other great options (compression, specifically). Of course that method will not be able to break apart your data into blocks. Dealing with small sequence data you should have no problem generating all groupings.

Upvotes: 0

sequence comparison with moved blocks detection

Answers (3)

Related Questions