Tom Carver
Tom Carver

Reputation: 1018

How can I get a *complete* list of changed files between two commits in github

(I know similar questions have been asked e.g. GitHub API - how to compare 2 commits but I don't think this is a duplicate)

As part of our build process we need to compare two commits in github and iterate through every file changed between them. There is a lovely API for comparing commits but it silently maxes out at 300 file changes, and while the API supports pagination you can only page through the list of commits, not the associated list of files. All my googling suggests that neither the gh CLI interface or the GraphQL API support diffing commit Ids either.

As best I can tell my options are

  1. clone the whole repo on every build and run git diff $lastReleaseHash...$newReleaseHash --name-status at the command line, which just seems inefficient
  2. Call github's compare API but ignore the list of files as it will max out at 300, instead page through all commits, then for each commit request the list of changed files, then manually stitch them together into an overall diff (i.e. tracking renames between commits, ignoring files that were created and then deleted within the range of commits, super tedious and error-prone)

Surely there are better options?!

Upvotes: 5

Views: 1077

Answers (2)

VonC
VonC

Reputation: 1329572

Looking for a solution that does not require a full clone. Can multiple GitHub API calls be combined to provide a good solution to this problem?

I did mention in "GitHub API — how to compare 2 commits" that the compare API silently maxes out at 300 files shown, as stated in the OP.

But I do not know of a better option.

I tried and tested my implementation: VonC/pgdiff, using pygithub.

    comparison = repo.compare(base_commit, head_commit)
    commits_list = list(comparison.commits)
    total_commits = len(commits_list)
    print(f"Total commits found: {total_commits}")
    with tqdm(
        total=total_commits, desc="Processing commits", position=0
    ) as pbar_commits:
        for commit in commits_list:
            pbar_commits.update(1)
            files_list = list(commit.files)
            total_files = len(files_list)
            pbar_commits.write(
                f"Total files found for commit '{commit.sha}': {total_files}"
            )
            for file in files_list:
                filename = file.filename
                status = file.status

                if status == "renamed":
                    previous_filename = file.previous_filename
                    all_changed_files[previous_filename] = "deleted"
                    all_changed_files[filename] = "added"
                else:
                    all_changed_files[filename] = status

    sorted_changed_files = dict(sorted(all_changed_files.items()))

    return sorted_changed_files

The pygithub Commit.files is paginated, so it should not be limited to 300 files.

I get:

(python_3.12.4) C:\Users\VonC\git\pgdiff>python pgdiff.py
Total commits found: 45
Total files found for commit 'd70600526e2efbea45eeb9dcb55c13f5e0ceba1f': 9
Total files found for commit '155dc8447d3590ea856bb17919bfc85172b52e09': 3
...
Total files found for commit 'a116aba5d54bf44c6fc27fa1a4c2431d53cf8ff5': 1
Processing commits: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 45/45 [00:20<00:00,  2.15it/s]

Added files:
t/unit-tests/lib-reftable.c
t/unit-tests/lib-reftable.h
t/unit-tests/t-reftable-reader.c

Removed files:
reftable/reftable-tests.h
reftable/stack_test.c
reftable/test_framework.c
reftable/test_framework.h
t/t0032-reftable-unittest.sh

Modified files:
.gitlab-ci.yml
Documentation/RelNotes/2.47.0.txt
Makefile
...

Upvotes: 2

Saddy
Saddy

Reputation: 1581

You can use

git clone --bare

To clone the repository with just the VC information (no files). Then do a git diff.

Upvotes: 0

Related Questions