Max
Max

Reputation: 27

Difference between 2 files block not line

I have found something similar What's the best way to diff blocks of text between two files? however it doens't work for me.

so I have 2 files

file1

block1
 sometext

block2
 sometext
 sometext2
 changedtext3

newblock3
 newtext

file2

block1
 sometext

block2
 sometext
 sometext2

When I use diff file1 file2 I get a result

changedtext2
newblock3
 newtext

which is correct. However is there a way to get a block where the change was made not just a line that was added/changed?

so I would like to get result like this

block2
 sometext
 sometext2
 changedtext3
newblock3
 newtext

Thank you

Upvotes: 1

Views: 103

Answers (1)

tshiono
tshiono

Reputation: 22042

How about putting the lines in the block together into one line using some delimiter, performing diff, then breaking the lines back again:

diff <(awk -v RS="" '{gsub("\n", "\033")} 1' file1) \
     <(awk -v RS="" '{gsub("\n", "\033")} 1' file2) \
     | tr "\033" "\n"

Output:

2,3c2
< block2
 sometext
 sometext2
 changedtext3
< newblock3
 newtext
---
> block2
 sometext
 sometext2
  • The option -v RS="" tells awk to split the file on empty lines, meaning each record corresponds to each block.
  • gsub("\n", "\033") substitutes the newline characters in the record with escape characters assuming the character is not used in the file.
  • The 1 next to the right curly bracket is equivalent to "print the record".
  • Then the diff command compares the files block by block.
  • Finally tr "\033" "\n" retrieves the newline characters in the block.

The output is not identical to your desired output, but will be close.

[Update]
If there may be no empty line between the blocks, please try:

diff <(sed $'s/^[^[:blank:]]/\\n&/' file1 | awk -v RS="" '{gsub("\n", "\033")} 1') \
     <(sed $'s/^[^[:blank:]]/\\n&/' file2 | awk -v RS="" '{gsub("\n", "\033")} 1') \
     | tr "\033" "\n"

It prepends a newline character to the starting string of the block as a preprocessing. It works with or without the empty lines.

Upvotes: 2

Related Questions