Logic for a File Comparison Script

Question

I'm trying to build a file comparison script in Javascript, that takes two versions of a file and outputs something like Github showing additions and deletions. I'm having trouble with the logic of the algorithm though. Here's the psuedo-code for my process so far:

var j = 0;
// check current file line by line
for(i=0; i < currentFileArr.length; i++){

    // see if the current line is different
    if(currentFileArr[i] !== previousFileArr[j]){

        if(previousFile.contains(currentFileArr[i])){
            // line is a deletion. find next line that wasn't deleted
            while(currentFileArr[i] !== previousFileArr[j]){
                j++;
            }
        } else {
            // line is an addition
        }
    } else { // lines are the same
        j++;
    }
}

The main problem is for lines that are not unique. Like new lines or lines with just a curly brace.

olegarch · Accepted Answer

You need consider each unique line from your files as metachar, i.e. "character" for some expanded alphabet. By this way, both your files would be turned to "strings of metachars".

Mostly efficient way - to create hashtable, contains unique strings, and use indexes in the table as metachars.

Thereafter, you can search for minimal editing sequence between those strings by Levenshtein algorithm:

http://www.let.rug.nl/kleiweg/lev/levenshtein.html

http://en.wikipedia.org/wiki/Levenshtein_distance

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance

Logic for a File Comparison Script

Answers (1)

Related Questions