swordfish
swordfish

Reputation: 4995

Efficient way to compare three text files in vb.net

I have a vb.net program in which I must compare three text files(two against one) and verify that they are all same. Even if there is one change I must know where the change is, which text file and which line. The format of the text file is like this

timestamp|ab|someval
timestamp|ab|someval1
timestamp|bc|someval2
timestamp|bc|someval2

All the text files are like this with a same format, but the values might be in a different order. For instance

text1.txt

ts|av|2
ts|ab|3
ts|av|4

text2.txt

ts|av|4
ts|ab|3
ts|av|2

This should not fail as they have the same values. Can any one tell me how can I go about this?

Upvotes: 1

Views: 887

Answers (1)

Jim Mischel
Jim Mischel

Reputation: 134005

So what you have, in effect, is potentially three different permutations of the same items. So if the text files were files of integers, then these three would be considered identical:

1,2,3 3,2,1 2,1,3

but 1,2,4 would not be.

If the file is small enough to fit into memory, then you can use a simple HashSet(of String) (I hope I got the VB syntax right). Note that you only have to keep the contents of ONE file in memory. The others are read line-by-line.

For the first file, read each line into an object (or perhaps just save it as a string) and add it to your HashSet. Now, for each of the other two files (assuming the hashSet is called file1Data):

int lineNum = 0;
foreach (var line in File.ReadLines(filename))
{
    ++lineNum;
    if (!file1Data.Contains(line))
    {
        // error here
    }
}

If the files aren't small enough to fit into memory, then I see no other option than to do an external sort on each file, then either use an existing diff program, or write a simple merge comparison.

Upvotes: 1

Related Questions