Reputation: 185
I would like to produce a graph or table of the total repository size vs time (or commit).
Is there a git command or tool that does this? I have tried git log
but it does not seem to have an option to export the size of the commits.
Upvotes: 10
Views: 2628
Reputation: 1323613
Instead of trying to measure "size" (which does not make sense with a Git repo, as explained by poke), you could visualize "code frequency" (ie the "size" of contributions in terms of lines added or removed over time):
The idea comes from "Introducing the New GitHub Graphs"
See "Stupid Git Trick - getting contributor stats", except you wouldn't necessarily use the --author
with git log --numstat
, but you can combine git log
with the --since
and --until
options.
Something like:
git log --since "OCT 4 2011" --until "OCT 11 2011" --pretty=tformat: --numstat | \
gawk '{ add += $1 ; subs += $2 ; loc += $1 - $2 } END \
{ printf "added lines: %s removed lines: %s total lines: %s\n",add,subs,loc }' -
Upvotes: 5
Reputation: 1836
I put together a python script to create a CSV that will bring you there quite easily: https://gist.github.com/StefanoChiodino/3f749424403c6070dae72dc308724dba
Having looked at the output, I believe it's overestimate the size of each commit for some reason. Anyways, it's quite useful for analysis.
#!/usr/bin/env python3
import csv
import subprocess
import sys
git_rev_list = subprocess.check_output(
"git log --no-merges --pretty='%H|%an|%aI|%s' origin/master".split(" ")).decode(
sys.stdout.encoding).rstrip().splitlines()
with open('commit_stats.csv', 'w', ) as file_handler:
csv = csv.writer(file_handler)
csv.writerow(["id", "author", "size", "file count", "date", "comment"])
for git_rev in git_rev_list:
try:
commit_size = 0
commit_id, author, date, comment = git_rev.strip("'").split("|", 3)
diff_tree = subprocess.check_output(
"git diff-tree -r -c -M -C --no-commit-id {}".format(commit_id).split(" ")).decode(sys.stdout.encoding)
for diff in diff_tree.splitlines():
try:
blob_id = diff.split(" ")[3]
if blob_id != "0000000000000000000000000000000000000000":
cat_file = subprocess.check_output("git cat-file -s {}".format(blob_id).split(" "))
commit_size += int(cat_file)
except:
pass
csv.writerow([commit_id, author, commit_size, len(diff_tree), date, comment])
except ValueError:
pass
Upvotes: 0
Reputation: 11571
Apart from coming up with a sensible answer to "what does repository size even mean?", you also have to consider what time should mean to you. The author date of commits? The commit date? When the commit actually became reachable by a particular branch? As opposed to other version control systems, time is often less useful in a Git context.
git log
output is normally the point in time when the commit was created for the first time. Commits can bake for days, weeks or even months before even leaving the developer's machine.git format-patch
will get a new commit date even though the author date stays the same. Still, with work taking place in separate branches the commit date can be months old.Upvotes: 3
Reputation: 387557
The size of a commit is very hard to define. First of all, most commits recycle a lot of existing Git objects. If you don’t change a file between revision A and B, should the size of B include the size of that file? Also, the repository size itself is not that easily determined either. Due to Git’s compression system, it will repack objects from time to time. The way it does that can be influenced by multiple things, so it might not pack the same way if you do it again, resulting in a different total size.
What you could do is check the size of the checked-out tree of every revision. But of course the result you will get there will be far away from the repository’s size itself.
Upvotes: 8
Reputation: 6994
Git doesn't provide such feature yet.
The best solution would be to iretate over the log and grep the filesize and add'em together.
There is an solition written in PERL by one of the makers of BitBucket (Daniel Rohan):
https://confluence.atlassian.com/plugins/servlet/mobile#content/view/292651328
Upvotes: 5