Reputation: 517
When resolving large data I notice a very slow performance, from the moment of returning the result from my resolver to the client.
I assume apollo-server
iterates over my result and checks the types... either way, the operation takes too long.
In my product I have to return large amount of data all at once, since its being used, all at once, to draw a chart in the UI. There is no pagination option for me where I can slice the data.
I suspect the slowness coming from apollo-server
and not my resolver object creation.
Note, that I log the time the resolver takes to create the object, its fast, and not the bottle neck.
Later operations performed by apollo-server
, which I dont know how to measure, takes a-lot of time.
Now, I have a version, where I return a custom scalar type JSON, the response, is much much faster. But I really prefer to return my Series
type.
I measure the difference between the two types (Series
and JSON
) by looking at the network panel.
when AMOUNT is set to 500, and the type is Series
, it takes ~1.5s (that is seconds)
when AMOUNT is set to 500, and the type is JSON
, it takes ~150ms (fast!)
when AMOUNT is set to 1000, and the type is Series
, its very slow...
when AMOUNT is set to 10000, and the type is Series
, I'm getting JavaScript heap out of memory (which is unfortunately what we experience in our product)
I've also compared apollo-server
performance to express-graphql
, the later works faster, yet still not as fast as returning a custom scalar JSON.
when AMOUNT is set to 500, apollo-server
, network takes 1.5s
when AMOUNT is set to 500, express-graphql
, network takes 800ms
when AMOUNT is set to 1000, apollo-server
, network takes 5.4s
when AMOUNT is set to 1000, express-graphql
, network takes 3.4s
The Stack:
"dependencies": {
"apollo-server": "^2.6.1",
"graphql": "^14.3.1",
"graphql-type-json": "^0.3.0",
"lodash": "^4.17.11"
}
The Code:
const _ = require("lodash");
const { performance } = require("perf_hooks");
const { ApolloServer, gql } = require("apollo-server");
const GraphQLJSON = require('graphql-type-json');
// The GraphQL schema
const typeDefs = gql`
scalar JSON
type Unit {
name: String!
value: String!
}
type Group {
name: String!
values: [Unit!]!
}
type Series {
data: [Group!]!
keys: [Unit!]!
hack: String
}
type Query {
complex: Series
}
`;
const AMOUNT = 500;
// A map of functions which return data for the schema.
const resolvers = {
Query: {
complex: () => {
let before = performance.now();
const result = {
data: _.times(AMOUNT, () => ({
name: "a",
values: _.times(AMOUNT, () => (
{
name: "a",
value: "a"
}
)),
})),
keys: _.times(AMOUNT, () => ({
name: "a",
value: "a"
}))
};
let after = performance.now() - before;
console.log("resolver took: ", after);
return result
}
}
};
const server = new ApolloServer({
typeDefs,
resolvers: _.assign({ JSON: GraphQLJSON }, resolvers),
});
server.listen().then(({ url }) => {
console.log(`🚀 Server ready at ${url}`);
});
The gql Query for the Playground (for type Series):
query {
complex {
data {
name
values {
name
value
}
}
keys {
name
value
}
}
}
The gql Query for the Playground (for custom scalar type JSON):
query {
complex
}
Here is a working example:
https://codesandbox.io/s/apollo-server-performance-issue-i7fk7
Any leads/ideas would be highly appreciated!
Upvotes: 11
Views: 13194
Reputation: 84807
There's a related open issue here. Lee Byron summed it up pretty well:
I think the TL;DR of this issue is that GraphQL has some overhead and that reducing that overhead is non-trivial and removing it completely may not be an option. Ultimately GraphQL.js is still responsible for making API boundary guarantees about the shape and type of the returned data and by design does not trust the underlying systems. In other words GraphQL.js does runtime type checking and sub-selection and this has some cost.
The benefits that GraphQL offers (validation, sub-selection, etc.) inevitably incur some overhead as they require additional processing of the data you're returning. And unfortunately, this overhead scales with the size of the data. I imagine if you were to implement a REST endpoint that supported partial responses and did response validation using something like Swagger or Joi, you'd encounter a similar issue.
The "heap out of memory" error means exactly what it says -- you're running out of memory on the heap. You can try to alleviate this by manually increasing the limit.
Typically, large datasets like this should be broken up by implementing pagination. If that's not an option, utilizing a custom scalar will be the next best approach. The biggest downside to this approach is that clients consuming your API will not be able to request specific fields inside the JSON object you return. Outside of patching GraphQL.js, there's really no other alternative to speed up the responses and reduce your memory usage.
Upvotes: 17
Reputation: 8418
Comment summary
This data structure/types:
id
fields);This way this dataset is not the graphQL was designed for. Of course graphQL still can be used for fetching this data but type parsing/matching should be disabled.
Using custom scalar types (graphql-type-json
) can be a solution. If you need some hybrid solution - you can type Group.values
as json (instead entire Series
). Groups still should have an id
field if you want to use normalized cache [access].
You can use apollo-link-rest
for fetching 'pure' json data (file) leaving type parsing/matching to be client side only.
If you want to use one graphql endpoint ... write own link - use directives - 'ask for json, get typed' - mix of two above. Sth like in rest link with de-/serializers.
In both alternatives - why do you really need it? Just for drawing? Not worth the effort. No pagination but hopefully streaming (live updates?) ... no cursors ... load more (subscriptions/polling) by ... last time update? Doable but 'not feel right'.
Upvotes: 2