Reputation: 33
I'm setting up an script for exporting all commits and pull requests for a bigger list of github repositories (about 4000).
After the basic idea of the script works i need a way to loop through all pages of commits for a repository.
I found out that i can export 100 commits per page. For some repos there is some more commits (like 8000) so that would be 80 pages i need to loop through.
I can't find a way to extract the number of pages from the github api.
What i've done so far is set up the script that it loops through all commits and exports them to a txt / csv file.
What i need to do is to know the total number of pages before i start looping through the commits of a repo.
This here gives me the number of pages in a way that i can't use it.
curl -u "user:password" -I https://api.github.com/repos/0chain/rocksdb/commits?per_page=100
RESULT:
Link: https://api.github.com/repositories/152923130/commits?per_page=100&page=2; rel="next", https://api.github.com/repositories/152923130/commits?per_page=100&page=75; rel="last"
I need the value 75 (or any other value from other repos) to be used as a variable in a loop.
Like so:
repolist=`cat repolist.txt`
repolistarray=($(echo $repolist))
repolength=$(echo "${#repolistarray[@]}")
for (( i = 0; i <= $repolength; i++ )); do
#here i need to extract the pagenumber
pagenumber=$(curl -u "user:password" -I https://api.github.com/repos/$(echo "${repolistarray[i]}")/commits?per_page=100)
for (( n = 1; n <= $pagenumber; n++ )); do
curl -u "user:password" -s https://api.github.com/repos/$(echo "${repolistarray[i]}")/commits?per_page=100&page$(echo "$n") >committest.txt
done
done
done
How can I get the "75" or any other result out of this
Link: https://api.github.com/repositories/152923130/commits?per_page=100&page=2; rel="next", https://api.github.com/repositories/152923130/commits?per_page=100&page=75; rel="last"
to be used as "n"?
Upvotes: 2
Views: 1365
Reputation: 804
The official GitHub CLI (gh
) supports a --paginate
flag that does the heavy lifting for you. Combined with jq
, you can get the answers you're looking for.
This is simpler, and should be more robust than the other Bash solutions posted earlier.
Total number of commits in the last 90 days:
gh api --paginate \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"/repos/sindresorhus/awesome/commits?since=$(date -I -v-90d)&per_page=100" |
jq length
Number of commits for the last 6 months, broken down by month, as CSV:
gh api --paginate \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"/repos/sindresorhus/awesome/commits?since=$(date -I -v-6m)&per_page=100" |
jq -r 'map(. + {month: (.commit.committer.date[:7])}) |
group_by(.month)[] | [(.[0].month), length] | @csv'
Output:
"2023-01",1
"2023-02",6
"2023-03",3
"2023-04",5
"2023-05",3
"2023-06",11
Upvotes: 0
Reputation: 4340
Here is something along the lines of what @Poshi commented: loop indefinitely requesting the next page until you hit an empty page, then break out of the inner loop, moving on to the next repo.
# this is the contents of a page past the last real page:
emptypage='[
]'
# here's a simpler way to iterate over each repo than using a bash array
cat repolist.txt | while read -d' ' repo; do
# loop indefinitely
page=0
while true; do
page=$((page + 1))
# minor improvement: use a variable, not a file.
# also, you don't need to echo variables, just use them
result=$(curl -u "user:password" -s \
"https://api.github.com/repos/$repo/commits?per_page=100&page=$n")
# if the result is empty, break out of the inner loop
[ "$result" = "$emptypage" ] && break
echo "$result" > committest.txt
# note that > overwrites (whereas >> appends),
# so committest.txt will be overwritten with each new page.
#
# in the final version, you probably want to process the results here,
# and then
#
# echo "$processed_results"
# done > repo1.txt
# done
#
# to ouput once per repo, or
#
# echo "$processed_results"
# done
# done > all_results.txt
#
# to output all results to a single file
done
done
Upvotes: 1
Reputation: 5762
Well, the method you ask for is not the most common one, usually it is done by fetching pages until no more data is available. But to answer your specific question, we must parse the line that contains the information. A quick and dirty way to do this could be:
response="Link: https://api.github.com/repositories/152923130/commits?per_page=100&page=2; rel=\"next\", https://api.github.com/repositories/152923130/commits?per_page=100&page=75; rel=\"last\""
<<< "$response" cut -f2- -d: | # First, get the contents of "Link": everything after the first colon
tr "," $'\n' | # Separate the different parts in different lines
grep 'rel="last"' | # Select the line with last page information
cut -f1 -d';' | # Keep only the URL
tr "?&" $'\n' | # Split URL and its parameters, one per line
grep -e "^page" | # Select the "page" parameter
cut -f2 -d= # Finally, extract the number we are interested in
There are some other ways to do this, with less commands, maybe simpler, but this one allows me to go step by step with the explanation. One of these other ways could be:
<<< "$response" sed 's/.*&page=\(.*\); rel="last".*/\1/'
This one makes some assumptions, like the page
will always be the last parameter.
Upvotes: 0