vegeta
vegeta

Reputation: 13

Compare two text files in ruby

I have two text files file1.txt and file2.txt. I want to find the difference b/w the file which will highlight the equal, insertion and deletion text. The final goal is to create a html file which will have the text (equal, insertion and deletion text) highlighted with different color and styles.

file1.txt

I am testing this ruby code for printing the file diff.

file2.txt

I am testing this code for printing the file diff. 

I am using this code

 doc1 = File.open('file1.txt').read    
 doc2 = open('file2.txt').read
 final_doc =  Diffy::Diff.new(doc1, doc2).each_chunk.to_a

The output is :

-I am testing this ruby code for printing the file diff.
+I am testing this code for printing the file diff.

However, I need the output in similar to below format.

equal:
  I am testing this
insertion:
  ruby
equal:
  code for printing the file diff.

In python there is a difflib through which it can be achieved but I have not found such functionality in the Ruby.

Upvotes: 1

Views: 2344

Answers (1)

Ryan Kopf
Ryan Kopf

Reputation: 534

I've found there's a few different libraries in Ruby for doing "Diffs", but they're more focused on checking line by line. I created some code that is used to compare a couple of relatively short strings and show the differences, a sort of quick hack that works great if it doesn't matter too much about highlighting the removed sections in the parts that they were removed from - to do that would require just a bit more thinking about the algorith. But this code works wonders for a small amount of text at a time.

The key is, like with any language processing, getting your tokenization right. You can't just process a string word by word. Really the best way would be to first loop through, recursively, and associate each token with a position in the text and use that to do the analysis, but this method below is fast and easy.

  def self.change_differences(text1,text2) #oldtext, newtext
    result = ""
    tokens = text2.split(/(?<=[?.!,])/) #Positive look behind regexp.
    for token in tokens
      if text1.sub!(token,"") #Yes it contained it.
        result += "<span class='diffsame'>" + token + "</span>"
      else
        result += "<span class='diffadd'>" + token + "</span>"
      end
    end
    tokens = text1.split(/(?<=[?.!,])/) #Positive look behind regexp.
    for token in tokens
      result += "<span class='diffremove'>"+token+"</span>"
    end
    return result
  end

Source: me!

Upvotes: 1

Related Questions