Michail Michailidis
Michail Michailidis

Reputation: 12181

How to use git shortlog to aggregate user commit stats over multiple repositories in a single directory?

I have a directory with a lot of Git repo subdirectories in it and I would like to accumulate information similar to

git shortlog -sne --no-merges

for all the repos in it sorting the users by all their total commits.

e.g for repo 1:

430 Author 1 <[email protected]>
 20 Author 2 <[email protected]>

e.g for repo 2:

123 Author 1 <[email protected]>
 92 Author 2 <[email protected]>

total result:

453 Author 1 <[email protected]>
112 Author 2 <[email protected]>

Is it possible to do that with git built-in tools?

I was able to go outside of the repo folders and run that for a single folder:

git -C repoFolder shortlog -sne --no-merges

Upvotes: 2

Views: 3525

Answers (1)

phd
phd

Reputation: 94453

cd in a loop into every subdirectory and process git shortlog output with awk:

for d in *; do git -C $d shortlog -ens --no-merges; done |
    awk '{name_email=""; for (i=2; i<=NF; i++) {name_email=name_email " " $i}; count_by_user[name_email]+=$1} END {for (name_email in count_by_user) print count_by_user[name_email], name_email}'

The awk script explained:

name_email="";

For every input line: start with empty variable name_email.

for (i=2; i<=NF; i++) {name_email=name_email " " $i};

Join all fields starting from 2 space-separated into name_email. I.e. combine all name+email fields.

count_by_user[name_email]+=$1

Create a new associative array count_by_user and in every line increase value (default is 0) by the value of the first field (commits count).

END {for (name_email in count_by_user) print count_by_user[name_email], name_email}

At the end print results: run through count_by_user indices (name+email), print the calculated counter, print name+email. Results are printed unsorted. Could be sorted in the very awk script or post-processed with | sort -nr.

Developed with gawk version of awk.

Upvotes: 7

Related Questions