jar
jar

Reputation: 2908

Why am I able to bypass pagination when I call the same field twice (with different queries) in GitHub's GraphQL API

I noticed something I don't understand while trying to get the number of open issues per repository for a user.

When I use the following query I am asked to perform pagination (as expected) -

query {
  user(login:"armsp"){
    repositories{
      nodes{
        name
        issues(states: OPEN){
          totalCount
        }
      }
    }    
  }
}

The error message after running the above -

{
  "data": {
    "user": null
  },
  "errors": [
    {
      "type": "MISSING_PAGINATION_BOUNDARIES",
      "path": [
        "user",
        "repositories"
      ],
      "locations": [
        {
          "line": 54,
          "column": 5
        }
      ],
      "message": "You must provide a `first` or `last` value to properly paginate the `repositories` connection."
    }
  ]
}

However when I do the following I actually get all the results which doesn't make any sense to me -

query {
  user(login:"armsp"){
    repositories{
      totalCount
    }
    repositories{
      nodes{
        name
        issues(states: OPEN){
          totalCount
        }
      }
    }   
  }
}

Shouldn't I be asked for pagination in the second query too ?

Upvotes: 6

Views: 1983

Answers (1)

Daniel Rearden
Daniel Rearden

Reputation: 84837

TLDR; This appears to be a bug. There's no way to bypass the limit applied when fetching a list of resources.

Limiting responses like this is a common feature of public APIs -- if the response could include thousands or millions of results, it'll tie up a lot of server resources to fulfill it all at once. Allowing users to make those sort of queries is both costly and a potential security risk.

Github's intent appears to be to always limit the amount of results when fetching a list of resources. This isn't well documented on the GraphQL side, but matches the behavior of their REST API:

Requests that return multiple items will be paginated to 30 items by default. You can specify further pages with the ?page parameter. For some resources, you can also set a custom page size up to 100 with the ?per_page parameter.

For connections, it looks like the check for the first or last parameter is only ran whenever the nodes field is present in the selection set. This makes sense, since this is ultimately the field we want to limit -- requesting other fields like totalDiskUsage or totalDiskUsage, even without a limit argument, is harmless with the regard to above concerns.

Things get funky when you consider how GraphQL handles selection sets with selections that have the same name. Without getting into the nitty gritty details, GraphQL will let you request the same field multiple times. If the field in question has a selection set, it will effectively merge the selection sets into a single one. So

query {
  user(login:"armsp") {
    repositories {
      totalCount
    }
    repositories {
      totalDiskUsage
    }
  }
}

becomes and is equivalent to

query {
  user(login:"armsp") {
    repositories {
      totalCount
      totalDiskUsage
    }
  }
}

Side note: The above does not hold true if you explicitly give one of the fields an alias since then the two fields have different response names.

All that to say, technically this query:

query {
  user(login:"armsp"){
    repositories{
      totalCount
    }
    repositories{
      nodes{
        name
        issues(states: OPEN){
          totalCount
        }
      }
    }   
  }
}

should also blow up with the same MISSING_PAGINATION_BOUNDARIES error. The fact that it doesn't means the selection set merging is somehow borking the check that's in place. This is clearly a bug. However, even while this appears to "work", it still doesn't get around whatever limits Github has applies at the storage layer -- you will always get at most 100 results even when exploiting the above bug.

Upvotes: 2

Related Questions