Casper Plomp
Casper Plomp

Reputation: 11

Is there a way to schedule backups for all databases in an optimized way in MarkLogic?

I am working as administrator in our company for more than 40 MarkLogic clusters. Each of them has different databases and sizes of databases. I would like to know if somebody has written code to create all database backups in 1 go.

At this moment I'm using an xqy script to go over all my databases and schedule a backup for each database. I schedule 5 minutes between each backup because running 2 backups simultanuously might result in inconsistent files on disk (at least that used to be in the past).

It would be nice to have the possibility to schedule a backup at cluster level (instead of database level) and that schedule should take care of creating a backup for all databases in the cluster (with an option to exclude certain databases or specifically include some databases).

This cluster backup should do a backup for the first database, wait for it to finish, then immediately start the next database backup and so on. This would reduce the total backup time. At this moment I have backup schedules that take more than an hour (from first till last backup) for a cluster that only has 5Gb of data in total :-(. The backup of the system databases already take 10 times 5 minutes waiting time.

In ML api terminology I'm thinking about something like this:

admin:cluster-weekly-backup(
   $backup-dir as xs:string,
   $backup-period as xs:positiveInteger,
   $days as xs:string+,
   $start-time as xs:time,
   $max-backups as xs:unsignedLong,
   $include-databases as xs:string,     
   $exclude-databases as xs:string,
   $backup-security-db as xs:boolean,
   $backup-schemas-db as xs:boolean,
   $backup-triggers-db as xs:boolean,
   [$include-replicas as xs:boolean],
   [$journal-archiving as xs:boolean],
   [$journal-archive-path as xs:string],
   [$lag-limit as xs:unsignedLong]
) as element(configuration)

Where in this case you either specify $include-databases or $exclude-databases, but not both. Default for $include-databases is "All" Default for $exclude-databases is "None"

If you specify 1 or more databases in "$include-databases" only those databases are being backed up If you specify 1 or more databases in "$exclude-databses" , those databases will be excluded from backup

I hope someone has already created something like this, or else can write some code for me.

My goal is to have only 1 scheduled job in a cluster for backups, that automatically includes all databases , even new databases should be backed up.

Casper

Upvotes: 1

Views: 89

Answers (1)

Casper Plomp
Casper Plomp

Reputation: 11

Thank you for your reply FIona.

I realize the admin:cluster-***-backup is an invalid command, I just wanted to point out that it would be convenient if there was one.

My experience with having multiple backups running at the same time, while including triggers and security database could result in backups being written in the wrong directory, therefore I want to run backups sequentially.

I now schedule all backups 5 minutes apart from each other, but that would even for small databases take at least 1 hour to complete all backups.

For large databases that would even be longer because you have to take some margin between the 2 consecutive backups.

To schedule them I use a xqy script that also creates the backup directories and I don't even have to specify the databases, it takes all databases from the hosts.

The issue is not the scheduling though, but minimizing the time it takes from first till last backup to make sure the backup window is as short as possible and will not have effect on performance during operating hours.

And I recently found out that the triggers, modules and security databases contain information that is specific to another database therefore I want to include them all in the backup of those databases.

Those backups are indeed for DR, all environments run in local failover and in 3 node clusters for HA.

Casper

Upvotes: 0

Related Questions