Reputation: 13
I am working on a project that compares two large text file versions (around 5000+ lines of text). The newer version contains potentially new and removed content. It is intended to help detect early changes in text versions as a team receives information from that text.
To solve the problem, I use the diff-match-patch libary, which allows me to identify already removed and new content. In the first step I search for changes.
public void compareStrings(String oldText, String newText){
DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diffMain(previousString, newString, false);
}
Then I filter the list by the keywords INSERT/DELETE to get only the new/removed content.
public String showAddedElements(){
String insertions = "";
for(Diff elem: diffs){
if(elem.operation == Operation.INSERT){
insertions = insertions + elem.text + System.lineSeparator();
}
}
return insertions;
}
However, when I output the contents, I sometimes get only single letters, like (o, contr, ler), when only single characters were removed/added. Instead, I would like to output the whole sentence in which a change occured. Is there a way to also retrieve the line number from the DiffMatchPatch where the changes occured?
Upvotes: 0
Views: 714
Reputation: 13
I have found a solution by using another libary for the line extraction. The DiffUtils (Class DiffUtils of DMitry Maumenko) helped me achieve the desired goal.
/**
* Converts a String to a list of lines by dividing the string at linebreaks.
* @param text The text to be converted to a line list
*/
private List<String> fileToLines(String text) {
List<String> lines = new LinkedList<String>();
Scanner scanner = new Scanner(text);
while (scanner.hasNext()) {
String line = scanner.nextLine();
lines.add(line);
}
scanner.close();
return lines;
}
/**
* Starts a line-by-line comparison between two strings. The results are included
* in an intern list element for further processing.
*
* @param firstText The first string to be compared
* @param secondText The second string to be compared
*/
public void startLineByLineComparison(String firstText, String secondText){
List<String> firstString = fileToLines(firstText);
List<String> secondString = fileToLines(secondText);
changes = DiffUtils.diff(firstString, secondString).getDeltas();
}
After inserting the list with changes can be extracted by using the following code, whereas elem.getType() represents the type of difference between the text:
/**
* Returns a String filled with all removed content including line position
* @return String with removed content
*/
public String returnRemovedContent(){
String deletions = "";
for(Delta elem: changes){
if(elem.getType() == TYPE.DELETE){
deletions = deletions + appendLines(elem.getOriginal()) + System.lineSeparator();
}
}
return deletions;
}
Upvotes: 0