Reputation: 82
In my graph there are approximately 196 000 C nodes, 600 000 A nodes, and 800 000 S nodes. 99% of C's are connected to a single A (with each A having anywhere from 0 - 20 Cs related), and all A's are connected to a single S.
I am running the following query
MATCH (c:C)<-[d:D]-(:A)<-[:u]-(s:S)
WITH s, d, c,
CASE WHEN c.start - 1 - 20000 < 0
THEN 0
ELSE c.start - 1 - 20000 END AS start
RETURN s.r, c.type, d.index,
substring(s.se, start, c.end-start + 1 + 20000);
It runs for around 2.5 hours, and then I get this response:
{
"message" : "The statement has been closed.",
"exception" : "BadInputException",
"fullname" : "org.neo4j.server.rest.repr.BadInputException",
"stacktrace" : [ "org.neo4j.server.rest.repr.RepresentationExceptionHandlingIterable.exceptionOnHasNext(RepresentationExceptionHandlingIterable.java:50)", "org.neo4j.helpers.collection.ExceptionHandlingIterable$1.hasNext(ExceptionHandlingIterable.java:46)", "org.neo4j.helpers.collection.IteratorWrapper.hasNext(IteratorWrapper.java:42)", "org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:71)", "org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:75)", "org.neo4j.server.rest.repr.MappingSerializer.putList(MappingSerializer.java:61)", "org.neo4j.server.rest.repr.CypherResultRepresentation.serialize(CypherResultRepresentation.java:83)", "org.neo4j.server.rest.repr.MappingRepresentation.serialize(MappingRepresentation.java:41)", "org.neo4j.server.rest.repr.OutputFormat.assemble(OutputFormat.java:215)", "org.neo4j.server.rest.repr.OutputFormat.formatRepresentation(OutputFormat.java:147)", "org.neo4j.server.rest.repr.OutputFormat.response(OutputFormat.java:130)", "org.neo4j.server.rest.repr.OutputFormat.ok(OutputFormat.java:67)", "org.neo4j.server.rest.web.CypherService.cypher(CypherService.java:101)", "java.lang.reflect.Method.invoke(Method.java:606)", "org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:139)", "org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)", "java.lang.Thread.run(Thread.java:745)" ],
"cause" : {
"message" : "The statement has been closed.",
"exception" : "NotInTransactionException",
I am just running this query via curl as follows
curl -g -H Accept:application/json -H Content-Type:application/json -X POST -d '{ "query":"MATCH (c:C)<-[d:D]-(:a)<-[:u]-(s:S) WITH s, d, c, CASE WHEN c.start - 1 - 20000 < 0 THEN 0 ELSE c.start - 1 - 20000 END AS start RETURN s.r, c.type, d.index, substring(s.se, start, c.end-start + 1 + 20000);", "params" : {} }' localhost:7474/db/data/cypher -o data.json
I have added "limit 3;" to the query and it does run and return expected results.
Have I not properly optimized the query? I have read about query optimization and can't see anything I could improve on, although I bet there is. I can not find much documentation on solving that exception either.
Any help would be great! Thanks
Edit: fixed typo
Edit: I reran the same query with an additional "WHERE c.prop = 'x'" to limit the initial C matching and it then returned an OutOfMemory Exception. I then did some more reading and came across this from Michael's post here. My query is now running and I think it is working. (There is a lot of data and it is downloading it to a file that is increasing in size.)
Upvotes: 0
Views: 51
Reputation: 41676
Which Neo4j version are you using?
I think you're creating billions and billions of paths.
To look at the cardinalities:
(c:C*196k)<-[d:D*1..20]-(:A*600k)<-[:u*1..1]-(s:S*800k)
Profile your statement, I think it makes sense to have it start from a C
and follow the path to the single a and the single s from there.
so you can use USING SCAN c:C
to force Cypher to scan the C nodes via the index which should give you 196k paths.
Each of those c-nodes would then be matched along the single-node-path.
So try @FrobberOfBits suggestion along with profiling and limiting the first WITH
to see if the correct data is returned.
See: http://neo4j.com/docs/stable/query-using.html#using-hinting-a-label-scan
Upvotes: 0
Reputation: 18002
So you're trying to match a LOT of different paths, and I think you probably are doing more computation than is necessary. You might want to try this reformulation:
MATCH (c:C)
WITH c, CASE WHEN c.start - 20001 < 0
THEN 0 ELSE c.start - 20001 as start
MATCH (c)<-[d:D]-(:A)<-[:u]-(s:S)
WITH c, start, s, d
RETURN s.r, c.type, d.index, substring(s.se, start, c.end - start + 20001);
My thought here is that you have the fewest number of C
's of any node. So start the match there, and do your math computation first, then base subsequent matches off of that. Otherwise you re-match c
many extra times depending on how many of the other nodes there are. You could further break this down based on the next-most-selective A
with an additional with
clause. I think this will help.
Upvotes: 1