Reputation: 2283
I'm trying to build a file comparison script in Javascript, that takes two versions of a file and outputs something like Github showing additions and deletions. I'm having trouble with the logic of the algorithm though. Here's the psuedo-code for my process so far:
var j = 0;
// check current file line by line
for(i=0; i < currentFileArr.length; i++){
// see if the current line is different
if(currentFileArr[i] !== previousFileArr[j]){
if(previousFile.contains(currentFileArr[i])){
// line is a deletion. find next line that wasn't deleted
while(currentFileArr[i] !== previousFileArr[j]){
j++;
}
} else {
// line is an addition
}
} else { // lines are the same
j++;
}
}
The main problem is for lines that are not unique. Like new lines or lines with just a curly brace.
Upvotes: 0
Views: 124
Reputation: 3891
You need consider each unique line from your files as metachar, i.e. "character" for some expanded alphabet. By this way, both your files would be turned to "strings of metachars".
Mostly efficient way - to create hashtable, contains unique strings, and use indexes in the table as metachars.
Thereafter, you can search for minimal editing sequence between those strings by Levenshtein algorithm:
http://www.let.rug.nl/kleiweg/lev/levenshtein.html
http://en.wikipedia.org/wiki/Levenshtein_distance
http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
Upvotes: 1