Burkhard
Burkhard

Reputation: 14738

Finding common blocks

I have two files (f1 and f2) containing some text (or binary data).
How can I quickly find common blocks?

e.g.
f1: ABC DEF
f2: XXABC XEF

output:

common blocks:
length 4: "ABC " in f1@0 and f2@2 length 2: "EF" in f1@5 and f2@8

Upvotes: 4

Views: 579

Answers (3)

David Medinets
David Medinets

Reputation: 5618

The open-source PMD project has a cut-and-paste detector module which is mentioned on this page: http://pmd.sourceforge.net/integrations.html.

Upvotes: 1

Torsten Marek
Torsten Marek

Reputation: 86552

Wikipedia has some pseudocode for finding the longest common substring between two sequences of data. In your case, you simply extract all common substring from the table that are not prefixes of other common substrings (i.e. maximal common substrings).

Upvotes: 1

torial
torial

Reputation: 13121

This is a great tool for such purposes.: http://sourceforge.net/projects/duplo/

Upvotes: 2

Related Questions