cesarsotovalero
cesarsotovalero

Reputation: 1327

How to get a list of the companies and organizations that contributed to a repository on GitHub?

I need to retrieve the name of all the organizations from the contributors of a GitHub repository. I know that the following API request gives the list of the first 100 contributors of a repository, ordered by their number of contributions:

https://api.github.com/repos/{owner}/{repo}/contributors?per_page=200&anon=true

However, this API call only shows the metadata from the first 100 contributors of a repo, and there is no information about which organizations they belong to.

What I need is iterating through the list of all contributors and retrieve all the organizations they belong to. Is there any way to get this information?

Upvotes: 0

Views: 433

Answers (1)

cesarsotovalero
cesarsotovalero

Reputation: 1327

GitHub users may belong to several organizations. The organizations administrator can accept or reject a GitHub user in the organization. On the other hand, Github users can also choose to declare that they belong to a particular company, and this information is displayed on their GitHub profile. Here is an illustrative example:

Example of GitHub user with company and organization

This information can be obtained via the GitHub API using bash commands (notably curl, jq, and grep). Here are working script examples of how to get this data for the GitHub repository ConsenSys/teku.

Get the list of contributing organizations

#!/bin/bash

# Get the GitHub API url of each user that contributed to the ConsenSys/teku project.
# Store in a txt file the list of URLs for post processing.
curl --fail --silent --show-error https://api.github.com/repos/ConsenSys/teku/contributors\?per_page\=100\&page\=1\&anon\=true | jq -r '.[].organizations_url' | grep 'https' > teku_contributors_organizations_url.txt

# Iterate over the list of URLs and append the curl result to a txt file.
for i in $(cat teku_contributors_organizations_url.txt); do
    content="$(curl -s "$i")"
    echo "$content" >> teku_contributors_organizations_url_data.txt
done

# Show the organizations that contributed to the project ordered by number of instances
cat teku_contributors_organizations_url_data.txt | grep "login" | sort  | uniq -c | sort -nr > teku_contributors_organizations.txt

OUTPUT

   4   "login": "ConsenSys",
   3   "login": "hyperledger",
   3   "login": "arithm3tica",
   3   "login": "PegaSysEng",
   3   "login": "EntEthAlliance",
   2   "login": "apache",
   1   "login": "tmio",
   1   "login": "splunkdlt",
   1   "login": "solsuite",
   1   "login": "sigp",
   1   "login": "puniverse",
   1   "login": "prrkl",
   1   "login": "openethereum",
   1   "login": "mana-ethereum",
   1   "login": "jbosgi",
   1   "login": "goerli",
   1   "login": "exthereum",
   1   "login": "ethsearch",
   1   "login": "ethjs",
   1   "login": "ethereum",
   1   "login": "eth-clients",
   1   "login": "eclipse",
   1   "login": "deltap2p",
   1   "login": "dappnode",
   1   "login": "byz-f",
   1   "login": "arquillian",
   1   "login": "argentlabs",
   1   "login": "Thera169",
   1   "login": "InternetOfPeers",
   1   "login": "Department-of-Decentralization",
   1   "login": "ChainSafe",
   1   "login": "Centareum"

Get the list of contributing companies

#!/bin/bash

# Get the GitHub API url of each user that contributed to the hyperledger/teku project.
# Store in a txt file the list of URLs for post processing.
curl --fail --silent --show-error https://api.github.com/repos/ConsenSys/teku/contributors\?per_page\=100\&page\=1\&anon\=true | jq -r '.[].url' | grep 'https' > teku_contributors_urls.txt

# Iterate over the list of URLs and append the curl result to a txt file.
for i in $(cat teku_contributors_urls.txt); do
    content="$(curl -s "$i")"
    echo "$content" >> teku_contributors_data.txt
done

# Show the list companies that contributed to the project ordered by number of instances
cat teku_contributors_data.txt | grep "company" | sort  | uniq -c | sort -nr > teku_contributors_companies.txt

OUTPUT

   5   "company": "Consensys",
   3   "company": "ConsenSys",
   2   "company": "AlmavivA",
   2   "company": "@tmio  ",
   2   "company": "@sushiswap ",
   2   "company": "@eucrypt",
   2   "company": "@element-fi",
   2   "company": "@derivadex",
   2   "company": "@blk-io",
   2   "company": "@PegaSysEng | @ConsenSys",
   2   "company": "@Consensys",
   2   "company": "@ConsenSys @hyperledger ",
   2   "company": "@ConsenSys ",
   2   "company": "@ChainSafe ",
   2   "company": "@ArgentLabs",
   1   "company": "RedHat Inc.",
   1   "company": "Hedera @hashgraph",
   1   "company": "Ethereum",
   1   "company": "Contract Worker for the Ethereum Foundation",
   1   "company": "Base2 Cloud",
   1   "company": "Baisysoft",
   1   "company": "@coinbase",
   1   "company": "@KiFoundation "

NOTE: For repositories with more than 100 contributors, you may get the message: “API rate limit exceeded.” This issue can be solved authenticating your GitHub account by adding the flag -u <user>:<token> to curl.

Upvotes: 1

Related Questions