Reputation: 75
I have the following graph structure
Java version
Neo4j version
Machine
Problem
I am doing some experiments with three queries. #1 is taking 16 seconds, #2 is taking 8 minutes and #3 is "crashing". Both #2 and #3 put all the available CPU cores in ~90% usage. I am using the web interface for evaluating those queries (and I will be using the REST API to integrate the app with neo4j)
I would like to know what is wrong with those queries and how I could optimise them. I am currently using the default settings.
START root=node:source(id="2")
MATCH root-[]->movies<-[]-others
WITH COUNT(movies) as movie_count, others as others
RETURN others.id, movie_count
ORDER BY movie_count DESC
LIMIT 10
START root=node:source(id="2")
MATCH
root-[]->stuff<-[]-others
WITH DISTINCT(others) as dothers
MATCH dothers-[]->different
RETURN different.id, COUNT(different) as count
ORDER BY count DESC
LIMIT 10
START root=node:source(id="2")
MATCH root-[*1..1]->stuff<-[*1..1]-other-[*1..1]->different
WHERE stuff.id <> different.id
WITH COUNT(different) as different_count, different as different
RETURN different.id, different_count
ORDER BY different_count DESC
LIMIT 10
Upvotes: 2
Views: 2067
Reputation: 39905
When looking for performance please go with the latest stable version (1.9.x at timepoint when writing this answer) of Neo4j.
2.0.0.M03 is a milestone build and not yet optimized. So far the focus is on feature completeness with regards to the new concept of labels and label based indexing.
Upvotes: 1
Reputation: 33145
Disclaimer: This advice is for 1.8 and 1.9. If you're using 2.0 or 2.1, these comments may no longer be valid.
Query 1: Make your WITH your RETURN, and skip that extra step.
Query 2: Don't do distinct in WITH as you are now. Go as far as you can without doing distinct. This looks like a premature optimization in the query that makes it not be lazy and has to store many more intermediate results to calculate the WITH results.
Query 3: Don't do -[*1..1]->; that's the same as -[]-> or -->, but it uses a slower matcher for variable length paths when it really just needs adjacent nodes and can use a fast matcher. Make the WITH your RETURN and take out that extra pipe it needs to go through so it can be lazier (although the order by kind of makes it hard to be lazy). See if you can get it to complete without the order by.
If you need faster responses and can't squeeze it out of your queries with my recommendations, you may need to turn to the Java API until Cypher performance improvements in 2.x. The unmanaged extension method makes these easy to call from the REST interface.
Upvotes: 1