Reputation: 33658
I want a diff with interleaved lines, i.e. with "hunks" no longer than one line.
For example instead of
-t1 = "Christmas 2013"
-t2 = "Easter 2013"
-t3 = "Thanksgiving 2013"
+t1 = "Christmas 2014"
+t2 = "Easter 2014"
+t3 = "Thanksgiving 2014"
I want this:
-t1 = "Christmas 2013"
+t1 = "Christmas 2014"
-t2 = "Easter 2013"
+t2 = "Easter 2014"
-t3 = "Thanksgiving 2013"
+t3 = "Thanksgiving 2014"
So far I have
git diff -U0 --ignore-space-at-eol before after holidays.ini
I tried setting --break-rewrites=0%/0%
, --break-rewrites=100%/0%
and so on but it didn't change anything (I don't even know if it's relevant to my problem).
Upvotes: 18
Views: 1330
Reputation: 3529
workaround: transform the output of diff -y
function difflines() {
# compare files line-by-line
# limitation: this removes trailing whitespace
# https://stackoverflow.com/a/71665866/10440128
local W=1000 # depends on input width
local c=$(((W+1)/2)) # center. +1 to round up
local ca=$((c-2))
local cb=$((c+2))
local color=true
local red=''
local green=''
local reset=''
if $color; then
red=$'\e[31m'
green=$'\e[32m'
reset=$'\e[0m'
fi
diff -y -t -W $W "$1" "$2" | while read -r L
do
a="${L:0:$ca}"
a="$(echo "$a" | sed -E 's/ +$//')"
b="${L:$cb}"
b="$(echo "$b" | sed -E 's/ +$//')"
if [ "$a" = "$b" ]; then
echo " $a"
echo
continue
fi
echo "$red-$a$reset"
echo "$green+$b$reset"
echo
done
}
example:
cat >file1 <<EOF
t1 = "Christmas 2013"
t2 = "Easter 2013"
t3 = "Thanksgiving 2013"
EOF
cat >file2 <<EOF
t1 = "Christmas 2013"
t2 = "Easter 2014"
t3 = "Thanksgiving 2014"
EOF
difflines file1 file2
output
t1 = "Christmas 2013"
-t2 = "Easter 2013"
+t2 = "Easter 2014"
-t3 = "Thanksgiving 2013"
+t3 = "Thanksgiving 2014"
based on this answer
Upvotes: 0
Reputation: 6061
I found a solution for my problem, rather a workaround, where I don't change the behaviour of diff
. With AWK, it is possible to process its output to cut the hunks having multiple lines into slices of one line:
diffungroup
awk -F ',|c' ' # process lines as "3,4c3,4" and following lines
/[0-9]+,[0-9]+c[0-9]+,[0-9]+/ {
ss = $1; se = $2; ds = $3; de = $4;
for (i = ss; i <= se; i++) {
getline
a[i] = $0
}
getline # skip "---"
i = ss
for (j = ds; j <= de; j++) {
print i "c" j
print a[i++]
print "---"
getline
print $0
}
next
}
{ print }
' "$@"
It transforms the output of diff
like:
1,2c1,2
< Salve<br/>
< Quomodo te habes?<br/>
---
> Salvete<br/>
> Quomodo vos habetis?<br/>
into:
1c1
< Salve<br/>
---
> Salvete<br/>
2c2
< Quomodo te habes?<br/>
---
> Quomodo vos habetis?<br/>
In the context explained in the question above, I invoke it in the next way:
diff short.html.orig short.html > short.diff
./diffungroup short.diff > long.diff
patch -z .orig long.html long.diff
And, as I said above, it works like a charm.
Upvotes: 3
Reputation: 5133
I'm glad I'm not the only one who wants to do this.
The following shows old and new on adjacent lines via paste and uses uniq
as the World's Worst Diff:
git show HEAD:./holidays.ini | paste -d '\n' - holidays.ini | uniq -u
"Christmas 2013"
"Christmas 2014"
"Easter 2013"
"Easter 2014"
"Thanksgiving 2013"
"Thanksgiving 2014"
Upvotes: 1
Reputation: 28180
If the diff is not required to be textual, you could use KDiff3:
This will give an even greater granularity than single lines.
Upvotes: 0
Reputation: 487755
None of the built-in diff algorithms will behave this way.
I'm curious as to what you'd like to see if, e.g., the change was to add one line and replace two others, so that (to grab your example) you'd have something like this:
-t1 = "Christmas 2013"
+t1 = "Christmas 2014"
+t2 = "Easter 2014"
-t3 = "Thanksgiving 2013"
+t3 = "Thanksgiving 2014"
Here, for t2
, there's nothing to delete.
In any case, I believe your best bet is likely to post-process the output of git diff -U0
.
If you're on a Unix-ish system you could also use original, non-unified diff, e.g.:
$ diff --git a/like_min.py b/like_min.py
index 05b9a4d..1c90084 100644
--- a/like_min.py
+++ b/like_min.py
@@ -1 +1 @@
-def like_min(iterable, key=None):
+def like_min(iterable, key=None): # comment
@@ -9 +9 @@ def like_min(iterable, key=None):
- for candidate in it:
+ for candidate in it: # another comment
$ git show HEAD:like_min.py | diff - like_min.py
1c1
< def like_min(iterable, key=None):
---
> def like_min(iterable, key=None): # comment
9c9
< for candidate in it:
---
> for candidate in it: # another comment
which might be easier to post-process (depending on many details). In particular each change starts with a line number and letter code (a
dd, c
hange, d
elete), so there's no need to figure out whether something is a pure-add or pure-delete, vs changes you'd like to split into one-line-at-a-time. You still might have to turn a "change" into a "change followed by add-or-delete" if the new number of lines does not match:
$ git show HEAD:like_min.py | diff - like_min.py
1c1,2
< def like_min(iterable, key=None):
---
> def like_min(iterable, key=None): # comment
> def like_min(iterable, key=None): # comment
9c10
< for candidate in it:
---
> for candidate in it: # another comment
Also, "old diff" may have different (and not the desired) white-space-ignoring options.
Fiddling with --break-rewrites
is orthogonal to what you want: it just changes the point at which git considers a file as "wholly rewritten", and thus shows the change as "delete entire previous file contents, insert all-new contents".
The default breakpoint is, according to the documentation, -B50%/60%
, which specifies that no more than 60% of the file can be "rewritten", or equivalently, "at least 40% of the file still matches". You might want to decrease this, but probably don't want to increase it. (Incidentally, I can't seem to set this to 0%; setting it to 1%
makes most changes become complete rewrites, but small changes, like changing just one line of a file, still show up as small changes rather than total-file-rewrites. This is probably because the similarity index is not based purely on line-at-a-time changes, but also includes intra-line matches.)
(That first number—the 50% in -B50%/60%
—is the similarity index value used for rename detection, assuming rename detection is enabled. Think of the two numbers as the "similarity and dissimilarity index" values: similarity index is "how close is file 1 to file 2", and dissimiliarity is just 100% minus similarity.)
Upvotes: 1