Reputation: 909
I've been trying to find all the authors of a git project, so that I can ask about relicensing their commits. I figured there'd be no point in contacting all the authors, as there may have been some who have had code in the codebase, but it was removed. So I wanted to contact only the authors with commits which are visible in the current HEAD.
I was told that git log had this capability, but I couldn't find anything on it except for something like:
git log --format='%an <%ae>'
Which does sort of what I'd like to achieve except it doesn't exclude authors without code in the current codebase.
How can I achieve this?
Upvotes: 10
Views: 4430
Reputation: 90286
The git-fame program lists contributors according to the number of surviving locs, so matches what's requested.
Install it with pip install git-fame
~$ git fame --cost hour,month
Blame: 100%|██████████| 74/74 [00:00<00:00, 96.51file/s]
Total commits: 1173
Total ctimes: 1055
Total files: 180
Total hours: 255.1
Total loc: 2716
Total months: 8.7
| Author | hrs | mths | loc | coms | fils | distribution |
|:---------------------------|------:|-------:|------:|-------:|-------:|:----------------|
| Casper da Costa-Luis | 100 | 7 | 2171 | 770 | 63 | 79.9/65.6/35.0 |
| Stephen Larroque | 16 | 1 | 243 | 202 | 19 | 8.9/17.2/10.6 |
| Kyle Altendorf | 6 | 0 | 41 | 31 | 3 | 1.5/ 2.6/ 1.7 |
| Guangshuo Chen | 2 | 0 | 35 | 18 | 6 | 1.3/ 1.5/ 3.3 |
| Matthew Stevens | 2 | 0 | 32 | 3 | 2 | 1.2/ 0.3/ 1.1 |
| Noam Yorav-Raphael | 3 | 0 | 23 | 11 | 4 | 0.8/ 0.9/ 2.2 |
| Daniel Panteleit | 2 | 0 | 16 | 2 | 2 | 0.6/ 0.2/ 1.1 |
| Mikhail Korobov | 2 | 0 | 15 | 11 | 6 | 0.6/ 0.9/ 3.3 |
| Hadrien Mary | 3 | 0 | 15 | 31 | 10 | 0.6/ 2.6/ 5.6 |
| Johannes Hansen | 2 | 0 | 14 | 1 | 2 | 0.5/ 0.1/ 1.1
Upvotes: 1
Reputation: 471
You can use git blame to determine the list of current contributors:
#!/bin/sh
set -e
IFS='
'
for f in `git ls-tree -r --name-only ${1:-HEAD}`; do
git blame -w -C -p "$f" | sed -n \
-e '/^author /{ s/^author //; h; }' \
-e '/^author-mail /{ s/^author-mail //; H; x; s/\n/ /p; }'
done | sort -u
You should pass -w
to ignore whitespace changes and -C to follow moves and copies to achieve more accurate attribution. Also, running it with -p
outputs in porcelain mode for robust parsing of the output (i.e., will not get tripped up by spaces in file names.)
Note that while -w
and -C
should give you better results than without them, this is still just an heuristic and may not be sufficient for your purposes.
Upvotes: 2
Reputation: 625
Simply use shortlog (see https://git-scm.com/docs/git-shortlog):
$ git shortlog -se
26 Bart Simpson <[email protected]>
6 Homer Simpson <[email protected]>
103 Lisa Simpson <[email protected]>
34 Marge Simpson <[email protected]>
This will print out a list of all authors in your history sorted alphabetically, including email addresses and the number of commits per author.
By default this analyzes the history leading to HEAD
, i.e. everything leading to the current commit.
I think this is exactly what you want.
You can sort by commit count using -n
to find out the most important committers.
$ git shortlog -sen
103 Lisa Simpson <[email protected]>
34 Marge Simpson <[email protected]>
26 Bart Simpson <[email protected]>
6 Homer Simpson <[email protected]>
Upvotes: 1
Reputation: 3758
IANAL, but as for the relicensing I am not so sure that it is enough to have only the permission of the authors who have any code in the current project. After all their contributions / commits somehow lead to the current state of the project.
That aside you may want to take a look at git blame. It shows what line of a file was introduced in which commit by which author. This should get you closer to the solution of your problem. Maybe some additional post processing with awk ... | sort | uniq
can do the rest.
However, git blame
only shows information for a single file, so you would have to repeat that for all files in the repository.
In the root directory of the Git repository, you could use a shell command like this on Linux systems:
find ./ -name '*.cpp' -print0 | xargs -0 -i git blame --show-email {} | awk ' { print $3 } ' | sort | uniq
This searches for C++ source files (extension *.cpp) with find and performs a git blame
on all of those files. The option --show-email
of git blame
shows e-mail addresses instead of names, which are easier to filter for, because names can consist of several words, while an address is usually just one. awk
then gets only the third column of the output, which is the mail address. (First is the short commit hash, second one is the file name.) Finally, sort | uniq
is used to get rid of duplicates, showing each address only once.
(Untested, but it may point you in the right direction.)
If you just want every author who ever comitted anything to the repository, just use
git log --format='%an <%ae>' | sort | uniq
instead.
Upvotes: 10