Reputation: 808
Based on my understanding of a diff algorithm that computes the difference between two files, there is more than one potential way any two files can be represented as a diff. The diff
utility attempts to make a tradeoff between minimality, user-readability, and performance when it generates the diff between any two files.
My question is, given default options to GNU coreutils diff, and given the same version of diff, and given the same two files as input, does diff always generate the same output? Or does it perform any heuristic choices that might affect the determinism of the output?
As a secondary question, if the answer to the above question is yes, how frequently does a new version of the application lead to different outputs for the same files, again assuming default options? And similarly, if the answer to the above question is no, are there any options that might guarantee deterministic output?
I am working with some very fragile tests written by a third party that perform a diff of two diffs and expects them to be identical. The test seems to intermittently fail. I am attempting to determine whether I can rule out nondeterminism in the output of the diff utility itself as a potential cause.
Upvotes: 1
Views: 72