John
John

Reputation: 11841

How to take a snapshot of neo4j database

I see there is a tool that allows for backups to be taken of a running Neo4J database, either via Java or via the backup tool.

The backup will obviously take some time to complete, during which time additional nodes may be added, modified or deleted. Is it possible to take a snapshot of the graph database at a particular instant in time?

My use case: N4J is used to store events, which are stored elsewhere. I'd like to take a snapshot of the graph at an instant in time, then when it's restored at a later date, know what was missing from the graph based on when the backup was taken and be able to reconstruct a complete version of the database that is accurate to the present time by adding the missing events.

Upvotes: 3

Views: 3922

Answers (2)

Stefan Armbruster
Stefan Armbruster

Reputation: 39915

The neo4j-backup tool is part of Neo4j Enterprise edition. It takes a backup consistently at the time you've started it. After backup is finished a verbose consistency check is run to validate recoverability. It works either as full backup or incrementally.

This tool does not incorporate restoring for a given point in time or comparing with other backups. A point-in-time restore can be achieved by combining it with a classic file backup tool. I've made good experience with backup2l. neo4j-backup would started as part of backup2l's PRE_BACKUP. The same approach should work with any other backup tool out there.

Using your backup tool you can retrieve the full graph.db directory at a desired point-in-time from your archives and use them.

Upvotes: 1

FrobberOfBits
FrobberOfBits

Reputation: 18002

There's a related question that has good discussion of this, let me cut to the chase.

If you're using the commercial version of neo4j, then neo4j backup options and/or the backup tool are your best options.

If you're using community edition, then you can't do online backups at present. I have several applications that run using neo4j community, and we have a cron job that runs at 03:00. It shuts down the application, and creates a copy of the database in another location (by copy, I mean it actually creates a tar.gz archive of the DB directory). After this is completed with other maintenance, the application gets restarted again.

Depending on file copy performance and DB size, this isn't too bad. We have a moderate sized DB and we simply accept about 10 minutes of downtime every night.

Upvotes: 2

Related Questions