r_31415
r_31415

Reputation: 8972

Extract meaningful changes with git diff

I'm trying to obtain changes between commits for a large number of HTML documents, but I quickly noticed that most changes are not important and are usually the result of logging, changes in versions to prevent caching or external scripts. For example:

<a class="support-ga" target="_blank" href="#">0fb63cacd50e / 0fb63cacd50e @ 
-app-151</a>
+app-107</a>
<input type='hidden' name='csrfmiddlewaretoken' 
-value='82NB5DdySoICu1mqcl0RZVk5dMCOVEQd'
+value='a0zBgxBevaBugotGpNKI6kMPsIsBbH44'
/>

The previous example shows that looking at those changes is probably not very interesting or useful.

I would like to know if there is a git diff command to ignore that kind of changes. Another alternative is to have a ranking of the differences based on similarity. So far I have been using the git diff --word-diff=porcelain --unified=0 HEAD~1 HEAD command and then processing that output to extract changes, calculate the Levenshtein distance and remove duplicates. That helps but it is not a great solution considering that git already knows which lines are supposed to be compared and provides a configurable number of lines as context.

Upvotes: 1

Views: 917

Answers (1)

VonC
VonC

Reputation: 1323115

You could try and write a diff driver for ignoring specific patterns.
See this discussion as an example.

echo '*.html filter=ignore_value' >> .gitattributes
git config filter.ignore_value.clean "sed -e '/^value= .*$/d'" 

That is just a first draft, as the value attribute might not be at the start of the lines: you need to adjust the regex in order to detect and ignore any line with the change you wish to skip.

The OP Robert Smith points to (in the comments) a more complete command with:

git diff --unified=0 HEAD~1 HEAD | grep -v -E -f PATTERNS.txt

Upvotes: 1

Related Questions