Ben_G
Ben_G

Reputation: 826

Is a GIT repo with tens or hundreds of thousands of commits just too big?

I've been tasked with migrating our entire PVCS repository to git including all of the history. The only way that I've come up with to do this is to run a PVCS VLOG command to extract the revision history (for all files) to a file and then parse that file (using a C# program) to get the list of revisions for each file.

Then revision-by-revision I GET the given revision of the file from PVCS, ADD the file to GIT and do a COMMIT. So for each of the ~14,000 files I will have a commit for each revision of the file (and each file could have from 1-100+ revisions). Am I crazy in thinking this will work? Are there just going to be too many commits making the repo too large and unwieldy?

Upvotes: 1

Views: 543

Answers (1)

Joseph K. Strauss
Joseph K. Strauss

Reputation: 4903

Disclaimer: I am not familiar with PVCS in particular.

However, I have dealt with a similar issue converting CVS to Git. There is a git command cvsimport, which groups file commits based on time, committer, and message. If there are tools that can convert PVCS to CVS or svn (there is an svn import for Git as well) then just convert in two steps.

Otherwise, I would suggest modifying your program as follows:

  • Sort all commits (across files) by date
  • For each commit
    • If committer, date, or message is different than previous commit, then commit
    • Get file content of current commit

Obviously, the dates should not have to exactly match. Make some determination regarding what is considered the same commit. Also, you may want to allow similar commit messages to be considered the same commit if, for instance, they have the same bug-tracking number.

Consider using git fast-import which bypasses the index for much faster processing time.

Upvotes: 1

Related Questions