Lin Du
Lin Du

Reputation: 102527

Is graphql schema circular reference an anti-pattern?

graphql schema like this:

type User {
  id: ID!
  location: Location
}

type Location {
  id: ID!
  user: User
}

Now, the client sends a graphql query. Theoretically, the User and Location can circular reference each other infinitely.

I think it's an anti-pattern. For my known, there is no middleware or way to limit the nesting depth of query both in graphql and apollo community.

This infinite nesting depth query will cost a lot of resources for my system, like bandwidth, hardware, performance. Not only server-side, but also client-side.

So, if graphql schema allow circular reference, there should be some middlewares or ways to limit the nesting depth of query. Or, add some constraints for the query.

Maybe do not allow circular reference is a better idea?

I prefer to sending another query and doing multiple operations in one query. It's much more simple.

Update

I found this library: https://github.com/slicknode/graphql-query-complexity. If graphql doesn't limit circular reference. This library can protect your application against resource exhaustion and DoS attacks.

Upvotes: 29

Views: 14328

Answers (4)

dlq
dlq

Reputation: 3229

The above answers provide good theoretical discussion on the question. I would like to add more practical considerations that occur in software development.

As @daniel-rearden points out, a consequence of circular references is that it allows for multiple query documents to retrieve the same data. In my experience, this is a bad practice because it makes client-side caching of GraphQL requests less predictable and more difficult, since a developer would have to explicitly specify that the documents are returning the same data in a different structure.

Furthermore, in unit testing, it is difficult to generate mock data for objects whose fields/properties contain circular references to the parent. (at least in JS/TS; if there are languages that support this easily out-of-the-box, I'd love to hear it in a comment)

Maintenance of a clear data hierarchy seems to be the clear choice for understandable and maintainable schemas. If a reference to a field's parent is frequently needed, it is perhaps best to build a separate query.

Aside: Truthfully, if it were not for the practical consequences of circular references, I would love to use them. It would be beautiful and amazing to represent data structures as a "mathematically perfect" directed graph.

Upvotes: 2

machineghost
machineghost

Reputation: 35820

TLDR; Circular references are an anti-pattern for non-rate-limited GraphQL APIs. APIs with rate limiting can safely use them.

Long Answer: Yes, true circular references are an anti-pattern on smaller/simpler APIs ... but when you get to the point of rate-limiting your API you can use that limiting to "kill two birds with one stone".

A perfect example of this was given in one of the other answers: Github's GraphQL API let's you request a repository, with its owner, with their repositories, with their owners ... infinitely ... or so you might think from the schema.

If you look at the API though (https://developer.github.com/v4/object/user/) you'll see their structure isn't directly circular: there are types in-between. For instance, User doesn't reference Repository, it references RepositoryConnection. Now, RepositoryConnection does have a RepositoryEdge, which does have a nodes property of type [Repository] ...

... but when you look at the implementation of the API: https://developer.github.com/v4/guides/resource-limitations/ you'll see that the resolvers behind the types are rate-limited (ie. no more than X nodes per query). This guards both against consumers who request too much (breadth-based issues) and consumers who request infinitely (depth-based issues).

Whenever a user requests a resource on GitHub it can allow circular references because it puts the burden on not letting them be circular onto the consumer. If the consumer fails, the query fails because of the rate-limiting.

This lets responsible users ask for the user, of the repository, owned by the same user ... if they really need that ... as long as they don't keep asking for the repositories owned by the owner of that repository, owned by ...

Thus, GraphQL APIs have two options:

  • avoid circular references (I think this is the default "best practice")
  • allow circular references, but limit the total nodes that can be queried per call, so that infinite circles aren't possible

If you don't want to rate-limit, GraphQL's approach of using different types can still give you a clue to a solution.

Let's say you have users and repositories: you need two types for both, a User and UserLink (or UserEdge, UserConnection, UserSummary ... take your pick), and a Repository and RepositoryLink.

Whenever someone requests a user via a root query, you return the User type. But that User type would not have:

repositories: [Repository]

it would have:

repositories: [RepositoryLink]

RepositoryLink would have the same "flat" fields as Repository has, but none of its potentically circular object fields. Instead of owner: User, it would have owner: ID.

Upvotes: 16

Daniel Rearden
Daniel Rearden

Reputation: 84817

It depends.

It's useful to remember that the same solution can be a good pattern in some contexts and an antipattern in others. The value of a solution depends on the context that you use it. — Martin Fowler

It's a valid point that circular references can introduce additional challenges. As you point out, they are a potential security risk in that they enable a malicious user to craft potentially very expensive queries. In my experience, they also make it easier for client teams to inadvertently overfetch data.

On the other hand, circular references allow an added level of flexibility. Running with your example, if we assume the following schema:

type Query {
  user(id: ID): User
  location(id: ID): Location
}

type User {
  id: ID!
  location: Location
}

type Location {
  id: ID!
  user: User
}

it's clear we could potentially make two different queries to fetch effectively the same data:

{
  # query 1
  user(id: ID) {
    id
    location {
      id
    }
  }

  # query 2
  location(id: ID) {
    id
    user {
      id
    }
  }
}

If the primary consumers of your API are one or more client teams working on the same project, this might not matter much. Your front end needs the data it fetches to be of a particular shape and you can design your schema around those needs. If the client always fetches the user, can get the location that way and doesn't need location information outside that context, it might make sense to only have a user query and omit the user field from the Location type. Even if you need a location query, it might still not make sense to expose a user field on it, depending on your client's needs.

On the flip side, imagine your API is consumed by a larger number of clients. Maybe you support multiple platforms, or multiple apps that do different things but share the same API for accessing your data layer. Or maybe you're exposing a public API designed to let third-party apps integrate with your service or product. In these scenarios, your idea of what a client needs is much blurrier. Suddenly, it's more important to expose a wide variety of ways to query the underlying data to satisfy the needs of both current clients and future ones. The same could be said for an API for a single client whose needs are likely to evolve over time.

It's always possible to "flatten" your schema as you suggest and provide additional queries as opposed to implementing relational fields. However, whether doing so is "simpler" for the client depends on the client. The best approach may be to enable each client to choose the data structure that fits their needs.

As with most architectural decisions, there's a trade-off and the right solution for you may not be the same as for another team.

If you do have circular references, all hope is not lost. Some implementations have built-in controls for limiting query depth. GraphQL.js does not, but there's libraries out there like graphql-depth-limit that do just that. It'd be worthwhile to point out that breadth can be just as large a problem as depth -- regardless of whether you have circular references, you should look into implementing pagination with a max limit when resolving Lists as well to prevent clients from potentially requesting thousands of records at a time.

As @DavidMaze points out, in addition to limiting the depth of client queries, you can also use dataloader to mitigate the cost of repeatedly fetching the same record from your data layer. While dataloader is typically used to batch requests to get around the "n+1 problem" that arises from lazily loading associations, it can also help here. In addition to batching, dataloader also caches the loaded records. That means subsequent loads for the same record (inside the same request) don't hit the db but are fetched from memory instead.

Upvotes: 27

David Maze
David Maze

Reputation: 159733

The pattern you show is fairly natural for a "graph" and I don't think it's especially discouraged in GraphQL. The GitHub GraphQL API is the thing I often look at when I wonder "how do people build larger GraphQL APIs", and there are routinely object cycles there: a Repository has a RepositoryOwner, which can be a User, which has a list of repositories.

At least graphql-ruby has a control to limit nesting depth. Apollo doesn't obviously have this control, but you might be able to build a custom data source or use the DataLoader library to avoid repeatedly fetching objects you already have.

Upvotes: 6

Related Questions