Reputation: 1018
(I know similar questions have been asked e.g. GitHub API - how to compare 2 commits but I don't think this is a duplicate)
As part of our build process we need to compare two commits in github and iterate through every file changed between them. There is a lovely API for comparing commits but it silently maxes out at 300 file changes, and while the API supports pagination you can only page through the list of commits, not the associated list of files. All my googling suggests that neither the gh CLI interface or the GraphQL API support diffing commit Ids either.
As best I can tell my options are
git diff $lastReleaseHash...$newReleaseHash --name-status
at the command line, which just seems inefficientSurely there are better options?!
Upvotes: 5
Views: 1077
Reputation: 1329572
Looking for a solution that does not require a full clone. Can multiple GitHub API calls be combined to provide a good solution to this problem?
I did mention in "GitHub API — how to compare 2 commits" that the compare API silently maxes out at 300 files shown, as stated in the OP.
But I do not know of a better option.
I tried and tested my implementation: VonC/pgdiff, using pygithub
.
comparison = repo.compare(base_commit, head_commit)
commits_list = list(comparison.commits)
total_commits = len(commits_list)
print(f"Total commits found: {total_commits}")
with tqdm(
total=total_commits, desc="Processing commits", position=0
) as pbar_commits:
for commit in commits_list:
pbar_commits.update(1)
files_list = list(commit.files)
total_files = len(files_list)
pbar_commits.write(
f"Total files found for commit '{commit.sha}': {total_files}"
)
for file in files_list:
filename = file.filename
status = file.status
if status == "renamed":
previous_filename = file.previous_filename
all_changed_files[previous_filename] = "deleted"
all_changed_files[filename] = "added"
else:
all_changed_files[filename] = status
sorted_changed_files = dict(sorted(all_changed_files.items()))
return sorted_changed_files
The pygithub Commit.files
is paginated, so it should not be limited to 300 files.
I get:
(python_3.12.4) C:\Users\VonC\git\pgdiff>python pgdiff.py
Total commits found: 45
Total files found for commit 'd70600526e2efbea45eeb9dcb55c13f5e0ceba1f': 9
Total files found for commit '155dc8447d3590ea856bb17919bfc85172b52e09': 3
...
Total files found for commit 'a116aba5d54bf44c6fc27fa1a4c2431d53cf8ff5': 1
Processing commits: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 45/45 [00:20<00:00, 2.15it/s]
Added files:
t/unit-tests/lib-reftable.c
t/unit-tests/lib-reftable.h
t/unit-tests/t-reftable-reader.c
Removed files:
reftable/reftable-tests.h
reftable/stack_test.c
reftable/test_framework.c
reftable/test_framework.h
t/t0032-reftable-unittest.sh
Modified files:
.gitlab-ci.yml
Documentation/RelNotes/2.47.0.txt
Makefile
...
Upvotes: 2
Reputation: 1581
You can use
git clone --bare
To clone the repository with just the VC information (no files). Then do a git diff.
Upvotes: 0