Benjamin Scharbau
Benjamin Scharbau

Reputation: 2089

Move properties from relation to node in Neo4J on large datasets

I'm trying to move a property I've set up on a relationship in Neo4J to one of it's member nodes, because I want to index that property, and as of version 2.2.5 which I am using, indexing on relationships is not possible.

However, when I try to move it via Cypher command MATCH (k)-[l]->(m) SET m.key = l.key, my request also drops due to a lack of memory. I have no possibility to add additional memory either.

Does any one know of a good way to do this without having to resort to lots of memory when dealing with large (ca. 20M) datasets?

Upvotes: 1

Views: 104

Answers (3)

MicTech
MicTech

Reputation: 45033

If it's one time operation I highly recommend you to write Unmanaged Extensions.

It will be much faster than Cypher.

Here is an example

Label startNodeLabel = DynamicLabel.label("StartNode");
Label endNodeLabel = DynamicLabel.label("EndNode");
RelationshipType relationshipType = DynamicRelationshipType.withName("RelationshipType");
String nodeProperty = "nodeProperty";
String relationshipProperty = "relationshipProperty";

try(Transaction tx = database.beginTx()) {
    final ResourceIterator<Node> nodes = database.findNodes(startNodeLabel);

    for (Node startNode : IteratorUtil.asCollection(nodes)) {
        if (startNode.hasRelationship(relationshipType, Direction.OUTGOING)) {
            final Iterable<Relationship> relationships = startNode.getRelationships(relationshipType, Direction.OUTGOING);

            for (Relationship relationship : relationships) {
                final Node endNode = relationship.getOtherNode(startNode);

                if (endNode.hasLabel(endNodeLabel)) {
                    endNode.setProperty(nodeProperty, relationship.getProperty(relationshipProperty));
                }
            }
        }
    }
    tx.success();
}

Upvotes: 2

cybersam
cybersam

Reputation: 66999

You can use use LIMIT to limit the query to a specific number of rows, and then repeat the query until no more rows are returned. That will also limit the amount of memory usage.

For example, if you also wanted to remove the key property from the relationship at the same time (and you wanted to process 100K rows each time):

[EDITED]

MATCH (k)-[l]->(m)
WHERE HAS(l.key)
SET m.key = l.key
REMOVE l.key
WITH l
LIMIT 100000
RETURN COUNT(*) AS nRows;

This query will return an nRows value of 0 when you are done.

Upvotes: 1

michaeak
michaeak

Reputation: 1669

If you do not want to go for an unmanaged extension because you are moving the properties as a one-time problem you can also write e.g. a shell script which calls the linux curl command and loops in a loop with skip and limit. This has the advantage that you don't need to move the values but can copy them.

MATCH (k)-[l]->(m)
WITH l skip 200000 limit 100000
SET m.key = l.key
RETURN COUNT(*) AS nRows

Replace 200000 with the value of the loop variable.

Upvotes: 1

Related Questions