LoveTW
LoveTW

Reputation: 3842

Why the node's label affect the query performance significantly in Neo4j?

I try to simplify my question. If all nodes in Neo4jDB have same label Science, what's the difference between MATCH n WHERE n.ID="UUID-0001" RETURN n and MATCH (n:Science) WHERE n.ID="UUID-0001" RETURN n. Why the performance is not the same?

My Neo4j database contains about 70000 nodes and 100 relations.

The nodes have two types: Paper and Author, and they both have an ID field.

I created each node with corresponding label, and I also use ID as the index.

However, since one of my functions need to query nodes by ID without considering the label. The query just like: MATCH n WHERE n.ID="UUID-0001" RETURN n. The query time cost about 4000~5000 ms!

But after adding Science for each node and using MATCH (n:Science) WHERE n.ID="UUID-0001" RETURN n. The query time became about 1000~1100 ms. Does anyone know the difference between these two cases?

PS. Count(n:Science) = Count(n:Paper) + Count(n:Author), which mean each node has two labels.

Upvotes: 3

Views: 672

Answers (2)

LoveTW
LoveTW

Reputation: 3842

Even though I got the advisement from @phil_20686 and @Michael Hunger, but I think these answers do not solve my question.

I think there are some tricks when using label. If their are 10 thousand nodes in Neo4j DB, and the type of these nodes are the same. The query will perform better when adding label to these nodes.

I hope this post can help some people and give me some feedback if you find the reasons. Thanks.

Upvotes: 0

phil_20686
phil_20686

Reputation: 4080

Because for every label Neo4j automatically creates an extra index. The Cypher language can be broadly thought of as piping + filtering, so Match n WHere ... will first get every node and then filter on the where part. Whereas Match (n:Science) Where... will get every node with label science (using an index) and then try to match the where. From your query performance we can see that about 1/5th of your nodes were marked science so the query runs in a fifth he time, because it did a fifth as many comparisons.

Upvotes: 4

Related Questions