sergelerner
sergelerner

Reputation: 517

Apollo Server Slow Performance when resolving large data

When resolving large data I notice a very slow performance, from the moment of returning the result from my resolver to the client.

I assume apollo-server iterates over my result and checks the types... either way, the operation takes too long.

In my product I have to return large amount of data all at once, since its being used, all at once, to draw a chart in the UI. There is no pagination option for me where I can slice the data.

I suspect the slowness coming from apollo-server and not my resolver object creation.

Note, that I log the time the resolver takes to create the object, its fast, and not the bottle neck.

Later operations performed by apollo-server, which I dont know how to measure, takes a-lot of time.

Now, I have a version, where I return a custom scalar type JSON, the response, is much much faster. But I really prefer to return my Series type.

I measure the difference between the two types (Series and JSON) by looking at the network panel.

when AMOUNT is set to 500, and the type is Series, it takes ~1.5s (that is seconds)

when AMOUNT is set to 500, and the type is JSON, it takes ~150ms (fast!)

when AMOUNT is set to 1000, and the type is Series, its very slow...

when AMOUNT is set to 10000, and the type is Series, I'm getting JavaScript heap out of memory (which is unfortunately what we experience in our product)


I've also compared apollo-server performance to express-graphql, the later works faster, yet still not as fast as returning a custom scalar JSON.

when AMOUNT is set to 500, apollo-server, network takes 1.5s

when AMOUNT is set to 500, express-graphql, network takes 800ms

when AMOUNT is set to 1000, apollo-server, network takes 5.4s

when AMOUNT is set to 1000, express-graphql, network takes 3.4s


The Stack:

"dependencies": {
  "apollo-server": "^2.6.1",
  "graphql": "^14.3.1",
  "graphql-type-json": "^0.3.0",
  "lodash": "^4.17.11"
}

The Code:

const _ = require("lodash");
const { performance } = require("perf_hooks");
const { ApolloServer, gql } = require("apollo-server");
const GraphQLJSON = require('graphql-type-json');

// The GraphQL schema
const typeDefs = gql`
  scalar JSON

  type Unit {
    name: String!
    value: String!
  }

  type Group {
    name: String!
    values: [Unit!]!
  }

  type Series {
    data: [Group!]!
    keys: [Unit!]!
    hack: String
  }

  type Query {
    complex: Series
  }
`;

const AMOUNT = 500;

// A map of functions which return data for the schema.
const resolvers = {
  Query: {
    complex: () => {
      let before = performance.now();

      const result = {
        data: _.times(AMOUNT, () => ({
          name: "a",
          values: _.times(AMOUNT, () => (
            {
              name: "a",
              value: "a"
            }
          )),
        })),
        keys: _.times(AMOUNT, () => ({
          name: "a",
          value: "a"
        }))
      };

      let after = performance.now() - before;

      console.log("resolver took: ", after);

      return result
    }
  }
};

const server = new ApolloServer({
  typeDefs,
  resolvers: _.assign({ JSON: GraphQLJSON }, resolvers),
});

server.listen().then(({ url }) => {
  console.log(`🚀 Server ready at ${url}`);
});


The gql Query for the Playground (for type Series):

query {
  complex {
    data {
      name
      values {
        name
        value
      }
    }
    keys {
      name
      value
    }
  }
}

The gql Query for the Playground (for custom scalar type JSON):

query {
  complex
}

Here is a working example:

https://codesandbox.io/s/apollo-server-performance-issue-i7fk7

Any leads/ideas would be highly appreciated!

Upvotes: 11

Views: 13194

Answers (2)

Daniel Rearden
Daniel Rearden

Reputation: 84807

There's a related open issue here. Lee Byron summed it up pretty well:

I think the TL;DR of this issue is that GraphQL has some overhead and that reducing that overhead is non-trivial and removing it completely may not be an option. Ultimately GraphQL.js is still responsible for making API boundary guarantees about the shape and type of the returned data and by design does not trust the underlying systems. In other words GraphQL.js does runtime type checking and sub-selection and this has some cost.

The benefits that GraphQL offers (validation, sub-selection, etc.) inevitably incur some overhead as they require additional processing of the data you're returning. And unfortunately, this overhead scales with the size of the data. I imagine if you were to implement a REST endpoint that supported partial responses and did response validation using something like Swagger or Joi, you'd encounter a similar issue.

The "heap out of memory" error means exactly what it says -- you're running out of memory on the heap. You can try to alleviate this by manually increasing the limit.

Typically, large datasets like this should be broken up by implementing pagination. If that's not an option, utilizing a custom scalar will be the next best approach. The biggest downside to this approach is that clients consuming your API will not be able to request specific fields inside the JSON object you return. Outside of patching GraphQL.js, there's really no other alternative to speed up the responses and reduce your memory usage.

Upvotes: 17

xadm
xadm

Reputation: 8418

Comment summary

This data structure/types:

  • are not individual entities;
  • just a series of [groupped] data;
  • don't need normalization;
  • won't be normalized properly in apollo cache (no id fields);

This way this dataset is not the graphQL was designed for. Of course graphQL still can be used for fetching this data but type parsing/matching should be disabled.

Using custom scalar types (graphql-type-json) can be a solution. If you need some hybrid solution - you can type Group.values as json (instead entire Series). Groups still should have an id field if you want to use normalized cache [access].

Alternative

You can use apollo-link-rest for fetching 'pure' json data (file) leaving type parsing/matching to be client side only.

More advanced alternative

If you want to use one graphql endpoint ... write own link - use directives - 'ask for json, get typed' - mix of two above. Sth like in rest link with de-/serializers.


In both alternatives - why do you really need it? Just for drawing? Not worth the effort. No pagination but hopefully streaming (live updates?) ... no cursors ... load more (subscriptions/polling) by ... last time update? Doable but 'not feel right'.

Upvotes: 2

Related Questions