Mehul
Mehul

Reputation: 178

Can I export all of my JSON documents of a collection to a CSV in Marklogic?

I have millions of documents in different collections in my database. I need to export them to a csv onto my local storage when I specify the collection name.

I tried mlcp export but didn't work. We cannot use corb for this because of some issues.

I want the csv to be in such a format that if I try a mlcp import then I should be able to restore all docs just the way they were.

Upvotes: 0

Views: 580

Answers (2)

rjrudin
rjrudin

Reputation: 2236

ml-gradle has support for exporting documents and referencing a transform, which can convert each document to CSV - https://github.com/marklogic-community/ml-gradle/wiki/Exporting-data#exporting-data-to-csv .

Unless all of your documents are flat, you likely need some custom code to determine how to map a hierarchical document into a flat row. So a REST transform is a reasonable solution there.

You can also use a TDE template to project your documents into rows, and the /v1/rows endpoint can return results as CSV. That of course requires creating and loading a TDE template, and then waiting for the matching documents to be re-indexed.

Upvotes: 1

grtjn
grtjn

Reputation: 20414

My first thought would be to use MLCP archive feature, and to not export to a CSV at all.

If you really want CSV, Corb2 would be my first thought. It provides CSV export functionality out of the box. It might be worth digging into why that didn't work for you.

DMSDK might work too, but involves writing code that handles the writing of CSV, which sounds cumbersome to me.

Last option that comes to mind would be Apache NiFi for which there are various MarkLogic Processors. It allows orchestration of data flow very generically. It could be rather overkill for your purpose though.

HTH!

Upvotes: 3

Related Questions