Reputation: 3175
I'm using the following code to get a repo's stars, but it only returns 40000 stars of Bootstrap repo, which is lower than actual 70717 stars. However it returns correct stars(31445) of JQuery repo. Why retrieving stars of Bootstrap is not correct ?
#!/usr/bin/python
from github import Github
# XXX: Specify your own access token here
ACCESS_TOKEN = ''
client = Github(ACCESS_TOKEN, per_page=100)
# Specify a username and repository of interest for that user.
REPO_LIST=[('twbs','bootstrap'),('jquery','jquery')]
for USER,REPO in REPO_LIST:
user = client.get_user(USER)
repo = user.get_repo(REPO)
# Get a list of people who have bookmarked the repo.
# Since you'll get a lazy iterator back, you have to traverse
# it if you want to get the total number of stargazers.
stargazers = [ s for s in repo.get_stargazers() ]
print("Number of stargazers", len(stargazers))
Upvotes: 5
Views: 1184
Reputation: 3072
There is a limit (i.e., 400) for pagination in Github APIs.
In the past, when pulling information from Github projects, nobody reached this limit because the number of records that are being pulled (e.g., stars in your question, or issue events in this post) did not reach the 40,000 (i.e., 40 times 100) limit.
Nowadays, some projects (like twbs/bootstrap or rails/rails) are grown too much and the current pagination cannot pull the full information, and as of now, I don't see any mechanism that solves this issue.
This is something that Github should care about and re-consider its API design.
Upvotes: 5
Reputation: 381
The response body will indicate if pagination is limited for a given resource listing:
❯ curl https://api.github.com/repos/twbs/bootstrap/stargazers\?per_page\=100\&page\=401
{
"message": "In order to keep the API fast for everyone, pagination is limited for this resource. Check the rel=last link relation in the Link response header to see how far back you can traverse.",
"documentation_url": "https://developer.github.com/v3/#pagination"
}
Upvotes: 7