Reputation: 9369
I have 130 million nodes with label Snp
. I want to convert the property position
from string
to int
for all nodes. I'm using neo4j 3.0.4 with apoc version 3.0.4.1.
Due to the large number of nodes this has to be done in batches. I tried the apoc.periodic.rock_n_roll()
procedure for this
CALL apoc.periodic.rock_n_roll(
'MATCH (n:Snp) WITH n RETURN id(n) AS id_n',
'MATCH (n:Snp) where id(n)={id_n} SET n.position = toInt(n.position)',
20000
)
I thought this matches all nodes in batches and then calls the second query for each batch. But it blocks neo4j with frequent GC and growing memory usage. The procedure has not finished in 3 hours.
It works if the first MATCH
is limited, the following takes ~20 seconds:
CALL apoc.periodic.rock_n_roll(
'MATCH (n:Snp) WITH n LIMIT 1000000 RETURN id(n) AS id_n',
'MATCH (n:Snp) where id(n)={id_n} SET n.position = toInt(n.position)',
20000
)
However, this is not the point of the procedure I think. Can I somehow use it differently to convert a property for a large set of nodes?
Upvotes: 3
Views: 878
Reputation: 67009
The first Cypher statement you pass to the apoc.periodic.rock_n_roll
procedure will attempt to get all 130 million Snp
nodes. This is probably why you are seeing high memory usage and slow processing. Batch processing is only performed on the second Cypher statement.
The apoc.periodic.commit
procedure should work better for your use case. The following call will get and convert 100K nodes at a time until they have all been processed.
CALL apoc.periodic.commit(
'MATCH (n:Snp) WHERE TOINT(n.position) <> n.position WITH n LIMIT {limit} SET n.position = TOINT(n.position) RETURN COUNT(*);',
{limit: 100000}
)
The apoc.periodic.commit
procedure repeatedly invokes its Cypher query until it returns 0. The MATCH
clause filters out nodes that already have an integer position
. The limit
parameter specifies the batch size.
Upvotes: 8