Cassie
Cassie

Reputation: 3099

Merge query optimization Neo4j

I want to create a relationship between 2 nodes where there are only a few unique pairs and all the others may get repeated. Probably, because of only a few unique nodes, the import-tool is not able to create a relationship, however, when I run a query for the relationship creation in a shell it takes too long. How can I optimize this query by some kind of filtering for uniqueness?

MATCH (a:Applications), (sms:Sms {id: a.application_id})
MERGE (a)-[r:APP_SMS]->(sms)
RETURN distinct a.application_id, sms.id

I have only found a possibility to use distinct in a return part of the query.

I have executed the same query with profile and limit 25 to see the query plan and results:

10382692 total db hits in 3219 ms

query plan

Upvotes: 0

Views: 165

Answers (1)

InverseFalcon
InverseFalcon

Reputation: 30397

As stdob-- suggested, you need to create an index on :Sms(id) so this becomes a cheap NodeByIndexSeek instead of NodeByLabelScan for the sms node.

That you only have 25 rows after the DISTINCT operation is a little concerning, as id fields tend to suggest uniqueness for nodes in a label, but that doesn't seem to be the case here. For the nodes which have the same id property, are these duplicate nodes, with the same properties, or are the other properties aside from id different? If you have duplicate nodes in the db that that suggests a modeling problem.

EDIT

Per the comments, you added a LIMIT 25 to the query so the DISTINCT 25 results makes sense.

There shouldn't be a duplication issue here, unless id isn't unique across :Sms nodes.

Not sure if there's much that can be optimized. You could try batching the MERGE of the relationship using apoc.periodic.iterate(), but you should do that without parallelization to avoid locking issues.

Upvotes: 1

Related Questions