Reputation: 2701
Consider the following files and diff results:
a1.txt
a
b
My name is Ian
a2.txt
a
a
b
My name is John
Running diff --side-by-side --suppress-common-lines a1.txt a2.txt
produces:
> a
My name is Ian | My name is John
Which correctly states that a
was added in a2.txt
and My name is Ian
changed to My name is John
.
However, if I remove the b
from both files, the produced results are different:
b1.txt
a
My name is Ian
b2.txt
a
a
My name is John
Running diff --side-by-side --suppress-common-lines b1.txt b2.txt
produces:
My name is Ian | a
> My name is John
This states that line My name is Ian
changed to a
and My name is John
was added to b2.txt
.
Even though the result of the second comparison is technically valid, the difference between a1.txt
and a2.txt
is equivalent to that of b1.txt
and b2.txt
, so why would the result not be equal?
Is there anything I can do such that the second comparison produces the same output as the first?
Upvotes: 3
Views: 78
Reputation: 66394
The discrepancy you observe between the two examples is normal; it just conflicts with your expectations of what diff
does. The diff
utility solves the longest-common-subsequence problem, using lines as units/atoms.
[...] the difference between
a1.txt
anda2.txt
is equivalent to that ofb1.txt
andb2.txt
, so why would the result not be equal?
Here, the longest common subsequences in your two examples are different and, roughly speaking, don't "line up" the same way. In the first example, you have
# a1.txt # a2.txt # line in common?
a n
a a y
b b y
My name is Ian My name is John n
whereas, in the second example, you have
# b1.txt # b2.txt # line in common?
a a y
My name is Ian a n
My name is John n
Therefore, as far as diff
is concerned, the differences between the two pairs of files are not equivalent. diff
has no memory that all you did to obtain the b[12].txt
files was to remove the b
line from each of the a[12].txt
files. All it sees is that the longest common subsequence now only consists in the one line that contains a
, and it deduces the difference between the two b[12].txt
files from that.
Is there anything I can do such that the second comparison produces the same output as the first?
Short of using a different diff algorithm (or implementing your own), I don't think so.
Upvotes: 3