Vergil C.
Vergil C.

Reputation: 1104

MarkLogic Backup OR Restore specific piece of data

I was wondering if a specific piece of data can be backed up or restored in MarkLogic.

Version 8.0-5.4 is used on CentOS, data has grown a lot.

I was wondering if for example only the last 3 month's data can be backed up OR from a full backup, only the last 3 month's of data can be restored to lower environments.

Upvotes: 1

Views: 90

Answers (1)

MarkLogic itself is unaware of the age of your content by default (unless you enabled tracking insert and update timestamps).

Furthermore, MarkLogic balances all content across all forests evenly based on the selected balancing strategy.

Some Ideas:

Archive:

  • In your system, find a way to isolate the old content (query or collection)
    • Then use MLCP to export the content to anarchive.
    • Or if you have hadoop, then use a similar strategy.
  • Then you can remove the content from the system
  • This makes it totally gone - but ahs the benefit of no index overhead if disk space is an issue.

Forests

  • Using a strategy as above to isolate your old content, move it all to a single forest.
  • Take that forest offline and detach it and then physically archive it. Unfortunately, this approach also includes the index data. You could purge them by hand - but that't a risky story for another time.
    • Note: If you were to upgrade to ML 9, then you could use time-based queries on your forest balancing strategy and roll all of your content onto a month-based forest each month and then archive the previous month - similar to log rotation.

Forest Backups

As each forest can be backed up on its own, then it is possible to consider creating a backup of the forest and then deleting that forest. I'm not sure of the benefits of this approach. I suppose that if indexes are not included in the backup, then this approach is superior to the MLCP/Hadoop approach.

Tiered Storage

I answered the question as I interpreted it. However, the full enterprise approach would be to embrace Tiered Storage and store various data on different media types to give the most cost-effective solution without the data actually going offline.

Upvotes: 3

Related Questions