How many SQL database calls are made when you do a deeply nested GraphQL query?

Question

I know with GraphQL you are to implement the backend handlers for the queries. So if you are using PostgreSQL, you might have a query like this:

query {
  authors {
    id
    name
    posts {
      id
      title
      comments {
        id
        body
        author {
          id
          name
        }
      }
    }
  }
}

The naive solution would be to do something like this:

const resolvers = {
  Query: {
    authors: () => {
      // somewhat realistic sql pseudocode
      return knex('authors').select('*')
    },
  },
  Author: {
    posts: (author) => {
      return knex('posts').where('author_id', author.id)
    },
  },
  Post: {
    comments: (post) => {
      return knex('comments').where('post_id', post.id)
    },
  },
};

However, this would be a pretty big problem. It would do the following essentially:

Make 1 query for all authors.
For each authors, make query for all posts. (n + 1 query)
For each post, make query for all comments. (n + 1 query)

So it's like a fanning out of queries. If there were 20 authors, each with 20 posts, that would be 21 db calls. If each post had 20 comments, that would be 401 db calls! 20 authors resolves 400 posts, which resolves 8000 comments, not like this is a real way you would do it, but to demonstrate the point. 1 -> 20 -> 400 db calls.

If we add the comments.author calls, that's another 8000 db calls (one for each comment)!

How would you batch this into let's say 3 db calls (1 for each type)? Is that what optimized GraphQL query resolvers do essentially? Or what is the best that can be done for this situation?

Ken Chan · Accepted Answer

This is the GraphQL N+1 loading issue.

Basically there are two ways to solve it (For simplicity , assume it only needs to load the authors and its posts)

Use Dataloader pattern. Basically its idea is to defer the actual loading time of the posts of each author to a particular time such that the posts for N authors can be batched loaded together by a single SQL. It also provides caching feature to further improve the performance for the same request.
Use "look ahead pattern" (A Java example is described at here) . Basically its idea is that when resolving the authors , you just look ahead to see if the query includeS the posts or not in the sub fields. If yes , you can then use a SQL join to get the authors together with its post in a single SQL.

Also , to prevent the malicious client from making a request that retrieve a very big graph , some GraphQL server will analyse the query and impose a depth limit on it.

How many SQL database calls are made when you do a deeply nested GraphQL query?

Answers (1)

Related Questions