Gunawan Deng
Gunawan Deng

Reputation: 41

What is the best way to sync data between 2 Neo4j databases?

Currently we have 2 Neo4j databases in our application. We use Spring Data Neo4j (SDN).

First database is our STAGING database, and the other is LIVE database. This is more for the purpose of Versioning and to prevent user messing up LIVE database.

We have about 50.000 nodes and 150.000 relationships. Our STAGING area is meant for small subset of users, and we allow doing CRUD operations only in STAGING. After the user has finalized/verified the data in STAGING, we will push the dataset to LIVE database. LIVE is only meant for READ ONLY database, CRUD is only allowed during the sync process.

I am wondering what's the best way to sync data between both databases and the sync should allow incremental updates.

I tried to do sync node by node using Cypher Query + SDN, but I hit a roadblock, it takes way too long.

@Query("START n=node:ItemCode('code:*') "
     + "WHERE has(n.status) AND n.status = 'ACTIVE') "
     + "RETURN n");
public Iterable<Item> getAllActiveItems();

On SyncService class which does the sync process from STAGING to LIVE db:

Iterable<Item> items = stagingRepo.getAllActiveItems();
for(Item item:items) {

    Item item = liveRepo.findItem(item.getCode());
    if(item == null) {
        //create item
    } else {
        //update item
    }
}

[UPDATE] Based on answer by Peter Neubauer, I would like to explore further if there is Database Link concept so that the Cypher query can run on two databases at one go.

Any ideas are welcome.

Thanks, Gunawan

Upvotes: 3

Views: 1499

Answers (1)

Peter Neubauer
Peter Neubauer

Reputation: 6331

Is there any way you can do the find and the update in one Cypher query and execute it in the neo4jTemplate.query("....", params) function?

If you use the code above, all matching stuff has to be materialized form the database into Item object and then saved again in your repo functions, creating that overhead.

Maybe a simple

START n=node:ItemCode('code:*')
WHERE has(n.status) AND n.status = 'ACTIVE')
WITH n
SET n.updated=1

and one for the new items would do the job? That way, the work is done in the DB and does not touch the SDN mapping layers.

Upvotes: 0

Related Questions