JamaicaBot
JamaicaBot

Reputation: 13

Retrieve Line Numbers from Diff Patch Match

I am working on a project that compares two large text file versions (around 5000+ lines of text). The newer version contains potentially new and removed content. It is intended to help detect early changes in text versions as a team receives information from that text.
To solve the problem, I use the diff-match-patch libary, which allows me to identify already removed and new content. In the first step I search for changes.

    public void compareStrings(String oldText, String newText){
        DiffMatchPatch dmp = new DiffMatchPatch();
        LinkedList<Diff> diffs = dmp.diffMain(previousString, newString, false);
    }

Then I filter the list by the keywords INSERT/DELETE to get only the new/removed content.

 public String showAddedElements(){
       
        String insertions = "";
        for(Diff elem: diffs){
            if(elem.operation == Operation.INSERT){
                insertions = insertions + elem.text + System.lineSeparator();
            }
        }
        return insertions;
    }

However, when I output the contents, I sometimes get only single letters, like (o, contr, ler), when only single characters were removed/added. Instead, I would like to output the whole sentence in which a change occured. Is there a way to also retrieve the line number from the DiffMatchPatch where the changes occured?

Upvotes: 0

Views: 714

Answers (1)

JamaicaBot
JamaicaBot

Reputation: 13

I have found a solution by using another libary for the line extraction. The DiffUtils (Class DiffUtils of DMitry Maumenko) helped me achieve the desired goal.

 /**
 * Converts a String to a list of lines by dividing the string at linebreaks.
 * @param text The text to be converted to a line list
 */
private List<String> fileToLines(String text) {
    List<String> lines = new LinkedList<String>();
    
    Scanner scanner = new Scanner(text);
    while (scanner.hasNext()) {
        String line = scanner.nextLine();
        lines.add(line);
    }
    scanner.close();
    return lines;
}

/**
 * Starts a line-by-line comparison between two strings. The results are included 
 * in an intern list element for further processing.
 * 
 * @param firstText The first string to be compared
 * @param secondText The second string to be compared
 */
public void startLineByLineComparison(String firstText, String secondText){
    List<String> firstString = fileToLines(firstText);
    List<String> secondString = fileToLines(secondText);
    changes = DiffUtils.diff(firstString, secondString).getDeltas();
}

After inserting the list with changes can be extracted by using the following code, whereas elem.getType() represents the type of difference between the text:

/**
 * Returns a String filled with all removed content including line position
 * @return String with removed content
 */
public String returnRemovedContent(){
    String deletions = "";
    for(Delta elem: changes){
        if(elem.getType() == TYPE.DELETE){
            deletions = deletions + appendLines(elem.getOriginal()) + System.lineSeparator();
        }
    }
    return deletions;
}

Upvotes: 0

Related Questions