prees
prees

Reputation: 82

cypher query BadInputException

In my graph there are approximately 196 000 C nodes, 600 000 A nodes, and 800 000 S nodes. 99% of C's are connected to a single A (with each A having anywhere from 0 - 20 Cs related), and all A's are connected to a single S.

I am running the following query

MATCH (c:C)<-[d:D]-(:A)<-[:u]-(s:S) 
WITH s, d, c, 
    CASE WHEN c.start - 1 - 20000 < 0 
    THEN 0 
    ELSE c.start - 1 - 20000 END AS start 
RETURN s.r, c.type, d.index, 
       substring(s.se, start, c.end-start + 1 + 20000);

It runs for around 2.5 hours, and then I get this response:

    {
  "message" : "The statement has been closed.",
  "exception" : "BadInputException",
  "fullname" : "org.neo4j.server.rest.repr.BadInputException",
  "stacktrace" : [ "org.neo4j.server.rest.repr.RepresentationExceptionHandlingIterable.exceptionOnHasNext(RepresentationExceptionHandlingIterable.java:50)", "org.neo4j.helpers.collection.ExceptionHandlingIterable$1.hasNext(ExceptionHandlingIterable.java:46)", "org.neo4j.helpers.collection.IteratorWrapper.hasNext(IteratorWrapper.java:42)", "org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:71)", "org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:75)", "org.neo4j.server.rest.repr.MappingSerializer.putList(MappingSerializer.java:61)", "org.neo4j.server.rest.repr.CypherResultRepresentation.serialize(CypherResultRepresentation.java:83)", "org.neo4j.server.rest.repr.MappingRepresentation.serialize(MappingRepresentation.java:41)", "org.neo4j.server.rest.repr.OutputFormat.assemble(OutputFormat.java:215)", "org.neo4j.server.rest.repr.OutputFormat.formatRepresentation(OutputFormat.java:147)", "org.neo4j.server.rest.repr.OutputFormat.response(OutputFormat.java:130)", "org.neo4j.server.rest.repr.OutputFormat.ok(OutputFormat.java:67)", "org.neo4j.server.rest.web.CypherService.cypher(CypherService.java:101)", "java.lang.reflect.Method.invoke(Method.java:606)", "org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:139)", "org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)", "java.lang.Thread.run(Thread.java:745)" ],
  "cause" : {
    "message" : "The statement has been closed.",
    "exception" : "NotInTransactionException",

I am just running this query via curl as follows

curl -g -H Accept:application/json -H Content-Type:application/json -X POST  -d '{ "query":"MATCH (c:C)<-[d:D]-(:a)<-[:u]-(s:S) WITH s, d, c, CASE WHEN c.start - 1 - 20000 < 0 THEN 0 ELSE c.start - 1 - 20000 END AS start RETURN s.r, c.type, d.index, substring(s.se, start, c.end-start + 1 + 20000);", "params" : {} }'  localhost:7474/db/data/cypher -o data.json

I have added "limit 3;" to the query and it does run and return expected results.

Have I not properly optimized the query? I have read about query optimization and can't see anything I could improve on, although I bet there is. I can not find much documentation on solving that exception either.

Any help would be great! Thanks

Edit: fixed typo

Edit: I reran the same query with an additional "WHERE c.prop = 'x'" to limit the initial C matching and it then returned an OutOfMemory Exception. I then did some more reading and came across this from Michael's post here. My query is now running and I think it is working. (There is a lot of data and it is downloading it to a file that is increasing in size.)

Upvotes: 0

Views: 51

Answers (2)

Michael Hunger
Michael Hunger

Reputation: 41676

Which Neo4j version are you using?

I think you're creating billions and billions of paths.

To look at the cardinalities:

(c:C*196k)<-[d:D*1..20]-(:A*600k)<-[:u*1..1]-(s:S*800k) 

Profile your statement, I think it makes sense to have it start from a C and follow the path to the single a and the single s from there.

so you can use USING SCAN c:C to force Cypher to scan the C nodes via the index which should give you 196k paths.

Each of those c-nodes would then be matched along the single-node-path.

So try @FrobberOfBits suggestion along with profiling and limiting the first WITH to see if the correct data is returned.

See: http://neo4j.com/docs/stable/query-using.html#using-hinting-a-label-scan

Upvotes: 0

FrobberOfBits
FrobberOfBits

Reputation: 18002

So you're trying to match a LOT of different paths, and I think you probably are doing more computation than is necessary. You might want to try this reformulation:

MATCH (c:C)
WITH c, CASE WHEN c.start - 20001 < 0
           THEN 0 ELSE c.start - 20001 as start
MATCH (c)<-[d:D]-(:A)<-[:u]-(s:S)
WITH c, start, s, d
RETURN s.r, c.type, d.index, substring(s.se, start, c.end - start + 20001);

My thought here is that you have the fewest number of C's of any node. So start the match there, and do your math computation first, then base subsequent matches off of that. Otherwise you re-match c many extra times depending on how many of the other nodes there are. You could further break this down based on the next-most-selective A with an additional with clause. I think this will help.

Upvotes: 1

Related Questions