Mike Holdsworth
Mike Holdsworth

Reputation: 1108

Copying a property from a node is slow with a lot of nodes

I'm migrating some properties in a labeled node and the query performance is very poor.

The old property is callerRef and the new property is code. There are 17m nodes that need to be updated that I want to process in batches. Absence of the code property on the entity indicates that it has not yet been upgraded.

profile match (e:Entity) where not has(e.code) with e limit 1000000 set e.key = e.callerKeyRef, e.code = e.callerRef;

There is one index in the Entity label and that is for code.

schema -l :Entity
Indexes
  ON :Entity(code) ONLINE
No constraints

The heap has 8gbs allocated running Neo4j 2.2.4. The problem, if I'm reading the plan right, is that ALL nodes in the label are being hit even though a limit clause is specified. I would have thought that in an unordered query where a limit is requested that processing would stop after the limit criteria is met.

+-------------------+
| No data returned. |
+-------------------+
Properties set: 2000000
870891 ms

Compiler CYPHER 2.2

+-------------+----------+----------+-------------+--------------------------+
|    Operator |     Rows |   DbHits | Identifiers |                    Other |
+-------------+----------+----------+-------------+--------------------------+
| EmptyResult |        0 |        0 |             |                          |
| UpdateGraph |  1000000 |  6000000 |           e | PropertySet; PropertySet |
|       Eager |  1000000 |        0 |           e |                          |
|       Slice |  1000000 |        0 |           e |             {  AUTOINT0} |
|      Filter |  1000000 | 16990200 |           e |     NOT(hasProp(e.code)) |
| NodeByLabel | 16990200 | 16990201 |           e |                  :Entity |
+-------------+----------+----------+-------------+--------------------------+

Total database accesses: 39980401

Am I missing something obvious? TIA

Upvotes: 1

Views: 62

Answers (2)

FylmTM
FylmTM

Reputation: 2007

Indexes are supported only for = and IN (which basically are the same, because Cypher compiler transofrms all = operations in IN).

Neo4j is schema-less database. So, if there are no property - there are no index data. That why it needs to scan all nodes.

My suggestions:

  • First step: add code property to all necessary nodes with some default "falsy" value
  • Make update using node.code = "none" where clause

Upvotes: 2

cybersam
cybersam

Reputation: 67009

It might be faster to first assign a new label, say ToDo, to all the nodes that have yet to be migrated:

MATCH (e:Entity)
WHERE NOT HAS (e.code)
SET e:ToDo;

Then, you can iteratively match 1000000 (or whatever) ToDo nodes at a time, removing the ToDo label after migrating each node:

MATCH (e:ToDo)
WITH e
LIMIT 1000000
SET e.key = e.callerKeyRef, e.code = e.callerRef
REMOVE e:ToDo;

Upvotes: 1

Related Questions