Mihai Parparita
Mihai Parparita

Reputation: 4236

Finding the oldest commit in a GitHub repository via the API

What is the most efficient way to determine when the initial commit in a GitHub repository was made? Repositories have a created_at property, but for repositories that contain imported history the oldest commit may be significantly older.

When using the command line something like this would work:

git rev-list --max-parents=0 HEAD

However I don't see an equivalent in the GitHub API.

Upvotes: 11

Views: 6893

Answers (6)

FedFranz
FedFranz

Reputation: 1059

Posting my solution, since all others didn't work for me.

The following script retrieves the list of commits for a given REPO ("owner/repo"), traverses to the last page if necessary, and outputs the JSON object of the last (oldest) commit.

    REPO="owner/repo"
    URL="https://api.github.com/repos/$REPO/commits"
    H=" -H \"Accept: application/vnd.github+json\" \
      -H \"X-GitHub-Api-Version: 2022-11-28\""
    
    response=$(curl -s -L --include $H $URL | awk 'NR > 1')
    
    # Split the output into header and json
    header=$(echo "$response" | awk 'BEGIN{RS="\r\n";ORS="\r\n"} /^[a-zA-Z0-9-]+:/')
    commits=$(echo "$response" | awk '!/^[a-zA-Z0-9-]+:/')
    
    # If paginated, get last page
    if [[ $header == *"link"* ]]; then
      # Extract the last page value
      link_line=$(echo "$header" | grep -i "^link:")
      last_page=$(echo "$link_line" | sed -n 's/.*page=\([0-9]\+\)[^0-9].*rel="last".*/\1/p')
    
      # Get last-page commits
      commits=$(curl -s -L $H $URL?page=$last_page)
    fi
    
    # Print first commit
    echo $commits | jq '.[-1].commit'

Upvotes: 0

Bertrand Martel
Bertrand Martel

Reputation: 45362

Using the GraphQL API, there is a workaround for getting the oldest commit (initial commit) in a specific branch.

First get the last commit and return the totalCount and the endCursor :

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

It returns something like that for the cursor and pageInfo object :

"totalCount": 931886,
"pageInfo": {
  "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}

I don't have any source about the cursor string format b961f8dc8976c091180839f4483d67b7c2ca2578 0 but I've tested with some other repository with more than 1000 commits and it seems that it's always formatted like:

<static hash> <incremented_number>

So you would just subtract 2 from totalCount (if totalCount is > 1) and get that oldest commit (or initial commit if you prefer):

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1, after: "b961f8dc8976c091180839f4483d67b7c2ca2578 931884") {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

which gives the following output (initial commit by Linus Torvalds) :

{
  "data": {
    "repository": {
      "ref": {
        "target": {
          "history": {
            "nodes": [
              {
                "message": "Linux-2.6.12-rc2\n\nInitial git repository build. I'm not bothering with the full history,\neven though we have it. We can create a separate \"historical\" git\narchive of that later if we want to, and in the meantime it's about\n3.2GB when imported into git - space that would just make the early\ngit days unnecessarily complicated, when we don't have a lot of good\ninfrastructure for it.\n\nLet it rip!",
                "committedDate": "2005-04-16T22:20:36Z",
                "authoredDate": "2005-04-16T22:20:36Z",
                "oid": "1da177e4c3f41524e886b7f1b8a0c1fc7321cac2",
                "author": {
                  "email": "[email protected]",
                  "name": "Linus Torvalds"
                }
              }
            ],
            "totalCount": 931886,
            "pageInfo": {
              "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 931885"
            }
          }
        }
      }
    }
  }
}

A simple implementation in to get the first commit using this method :

import requests

token = "YOUR_TOKEN"

name = "linux"
owner = "torvalds"
branch = "master"

query = """
query ($name: String!, $owner: String!, $branch: String!){
  repository(name: $name, owner: $owner) {
    ref(qualifiedName: $branch) {
      target {
        ... on Commit {
          history(first: 1, after: %s) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}
"""

def getHistory(cursor):
    r = requests.post("https://api.github.com/graphql",
        headers = {
            "Authorization": f"Bearer {token}"
        },
        json = {
            "query": query % cursor,
            "variables": {
                "name": name,
                "owner": owner,
                "branch": branch
            }
        })
    return r.json()["data"]["repository"]["ref"]["target"]["history"]

#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
    cursor = history["pageInfo"]["endCursor"].split(" ")
    cursor[1] = str(totalCount - 2)
    history = getHistory(f"\"{' '.join(cursor)}\"")
    print(history["nodes"][0])
else:
    print("got oldest commit (initial commit)")
    print(history["nodes"][0])

You can find an example in on this post

Upvotes: 9

fregante
fregante

Reputation: 31698

This isn't via API, but on GitHub.com: if you have the latest commit SHA and the commit count, you can build the URL to find it:

https://github.com/USER/REPO/commits?after=LAST_COMMIT_SHA+COMMIT_COUNT_MINUS_2

# Example. Commit count in this case was 1573
https://github.com/sindresorhus/refined-github/commits/master
  ?after=a76ed868a84cd0078d8423999faaba7380b0df1b+1571

Upvotes: -2

JamesSchiiller
JamesSchiiller

Reputation: 83

Trial and error on the page number,

https://github.com/fatfreecrm/fat_free_crm/commits/master?page=126

The git history, maybe using gitk for instance, could help your trial and error be more efficient.

Upvotes: -2

Mihai Parparita
Mihai Parparita

Reputation: 4236

This can be done in as few as two requests, if data is already cached (on GitHub's side) and depending on your precision requirements.

First check to see if there are in fact commits before the creation time by doing a GET for /repos/:owner/:repo/commits with the until parameter set to the creation time (as suggested by VonC's answer) and limiting the number returned to 1 commit (via the per_page parameter).

If there are commits before the creation time, then the contributors statistics endpoint (/repos/:owner/:repo/stats/contributors) can be invoked. The response has a weeks list per contributor, and the oldest w value there is the same week as the oldest commit.

If you need a precise timestamp, you can then use the commits listing endpoint again with until and since set to the 7 days after the oldest week value.

Note that the statistics endpoint may return a 202 indicating that statistics are not available, in which case a retry in a few seconds is required.

Upvotes: 3

VonC
VonC

Reputation: 1323943

One suggestion would be to list commits on a repo (See GitHub api V3 section), using the until parameter, set to the creation of the repo (plus one day, for instance).

GET /repos/:owner/:repo/commits

That way, you would list all commits created at the time of the repo being created, or before: that would limit the list, excluding all the commits created after the repo creation.

Upvotes: 2

Related Questions