Antony
Antony

Reputation: 976

How to identify duplicate documents in MarkLogic server?

I have created the one database in MarkLogic server and also attached one forest for same database. How can I identify the duplicate documents in Marklogic database?

For example I have C.xml in URI /A/B/c.xml and /D/E/c.xml both are same forest and same database. How can I determine whether c.xml is duplicate or not?

Upvotes: 0

Views: 706

Answers (2)

DALDEI
DALDEI

Reputation: 3732

No "two documents" are "the same". period. Therefore, there are no duplicate documents. Problem solved.

That leads to answer being dependant on the definition of "duplicate". If you mean "has the same semantic data content" then the deep-equal method will work for xml files, but will fail if you intend for non-semantic differences to be considered like extra whitespace between attributes, attribute ordering, namespace prefixes etc. If you mean "same content and same properties, permissions, collections etc" that requires additional checks.
If you know how GIT works that is a good mental model. You can 2 files with the same content but that doesn't mean they are 'duplicates' --
A more subtle issue is if you mean 2 documents with the same URI -- that can happen if you muck around with forests. Create 2 databases, put /a.xml in each one then detach the forest from one and attach it to the other -- now you can have 2 documents (with the same or different content) with the same URI. Don't do that. For non-xml documents (or for xml documents) you can compare the text serialized format -- I suggest a hash (like a md5) which you calculate on all documents -- you can then compare the hashes to see if the documents *have the same normalized text content" (not the same quite as 'duplicate')

Upvotes: 0

Mads Hansen
Mads Hansen

Reputation: 66714

In order to compare two documents and determine if they are duplicates (same documents loaded with different URIs), you could use the fn:deep-equal() function.

For example:

let $doc1 := fn:doc("/A/B/c.xml")
let $doc2 := fn:doc("/D/E/c.xml")
return fn:deep-equal($doc1, $doc2)

Upvotes: 1

Related Questions