Reputation: 1429
Does anybody know of a diff-like tool that can show me the changes between two text files, but ignore changes in whitespace including newlines?
Here's an example:
the quick brown fox jumped over the lazy bear. the quick brown fox jumped over the lazy bear. the quick brown fox jumped over the lazy bear. the quick brown fox jumped over the lazy bear.
quick brown fox jumped over the lazy bear. the quick brown fox jumped over the lazy bear. the quick brown fox jumped over the lazy bear. the quick brown fox jumped over the lazy bear.
All I did was delete one word and reflow it, but "diff -b" detects a change on every line (as it should; I'm not saying this is a bug in diff). But for large LaTeX files this is a major problem; change one word in a long paragraph and the diff you get back is basically useless.
By the way, I'm aware that this requires way more computational power than the usual lines-are-atomic diff. I'm only doing this on small human-generated files and am happy to wait a long time if I have to.
Upvotes: 17
Views: 4462
Reputation: 13312
wdiff does word-by-word alignment.
For an easy-to-read display in a terminal, run
wdiff -al <file1> <file2> | less
This will show (at least on my system) insertions in <file2>
boldfaced and deletions from <file2>
underlined.
Upvotes: 13
Reputation: 129363
One option is to do this by splitting the entire file into words. Not 100% the same result in terns of knowing the context but very fine-tuned to the type of change you care about.
Example :
cat file1 | perl5.8 -e '{s/\s+/\n/g;}' > file1.split_words
cat file2 | perl5.8 -e '{s/\s+/\n/g;}' > file2.split_words
diff file1.split_words file2.split_words
You can do even better if the text has special properies, to be more specific, the reflow only happens within the bounds of a paragraph which is defined as 2 newlines in a row - simply replace all the single newlines with spaces and run regular diff -w
on results.
Upvotes: 1