Hele
Hele

Reputation: 1578

Java efficiency in comparing large checksum lists

I'm trying to compare two directories having about 15k files each for any changes. A is the newer version and B must be updated to it.

I have two large checksum list files, call them A and B. A is newer and B is an older version. Each have about 15k entries that look sort of like the following :

<entry1 -filepath> <entry1 -checksum>
<entry2 -filepath> <entry2 -checksum>
<entry3 -filepath> <entry3 -checksum>
.                  .
.                  .
.                  .

The entries are listed in alphabetical order. The two files need to be compared to check the following :

1. Two entries have the same file path but different checksums.
2. An entry exists in file A but not in file B.
3. An entry exists in file B but not in file A.

My proposal algorithm :

int currentBLine = -1;

for(int index = 0; index < A.length; index++)
{
    String newfilepath = A[index].getFilePath();
    String newchecksum = A[index].getCheckSum();

    for(; currentBLine < B.length; currentBLine++)
    {
        String oldfilepath = B[currentBLine].getFilePath();
        String oldchecksum = B[currentBLine].getCheckSum();

        if(filepath.compareTo(oldfilepath) > 0)
        {
            deleteFile(oldfilepath);
        }
        else if(filepath.compareTo(oldfilepath) == 0)
        {
            if(checksum.equals(oldchecksum)
            {
                currentBLine++;
                break;
            }
            else
            {
                updateFile(oldfilepath, newfilepath);
                break;
            }
        }
        else
        {
            createFile(newfilepath);
            break;
        }
    }
}

Is this the most efficient way of doing this? Am I doing something wrong here?

If anyone sees an XY problem, just let me know and I will fill in the background.

Upvotes: 4

Views: 538

Answers (1)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726779

The program that you have (two nested loops with the break in the inner loop) implements the standard algorithm for processing two sorted collections together. It is similar to the one that you use when merging two sorted lists: make two indexes, one for each list, and loop until both lists reach the end.

You can bring it to its classic form by making it a single loop instead of using two nested loops. At each step of the loop you perform the comparison similar to what you've got in the three-way if statement that you have. The only difference is that you wouldn't use a break, and you would need to check the indexes into A and B to be within their limits. If both indexes are within A and B limits, compare the files and check sums the way that you have coded. If you reached the end of A, delete the B file. If you reached the end of B, copy the A file. The loop ends once you have exhausted both lists.

Upvotes: 1

Related Questions