Philip Oakley
Philip Oakley

Reputation: 14061

Checking for duplicate file (contents) in git?

In my 'project/repo' I have two MS Visual Studio projects, one for the main code, and an independent one for tests. I have some files that are common to both (in the copy and paste sense) and I'd like to see / check which ones they are.

What is the right Git commands (or Gui menu clicks) to see if I have used the same content blob twice in the overall repo tree? If I have read all the tutorials correctly, git should have a single SHA1 for the two copies of the same file content and already know about it. I am hoping Git has a command that finds and displays these duplicate usage file paths.

Eventually I'd like to be able to find out the diffs between the versions when there is a common ancestor blob SHA1 (but not a common location). [i.e. during testing one version gets updated ahead of the other version...]

I know it isn't best practice to have such duplicates, but it is the way the work has ended up :-(

I have Msysgit and GitExtensions on windows...

Upvotes: 3

Views: 2683

Answers (1)

manojlds
manojlds

Reputation: 301147

You can do something like

git ls-tree -r HEAD

To see the blobs and the files.

If you don't want to manually look which are the same blobs:

git ls-tree -r HEAD |
    sort -t ' ' -k 3 |
        perl -ne '$1 && / $1\t/ && print "\e[0;31m" ; / ([0-9a-f]{40})\t/; print "$_\e[0m"'

From: Git: Find duplicate blobs (files) in this tree

Upvotes: 7

Related Questions