thichxai
thichxai

Reputation: 1147

How to exclude elements while exporting documents?

I have thousands of XML documents stored in MarkLogic. How can I exclude elements instructorName and studentName in $uris then save to files?

This code saves every element into files. I don't want to include elements "instructorName" and "studentName" to be saved into xml files.

let $uris :=
  cts:uris(
    (),
    ("descending"),
    cts:and-query((
        cts:collection-query(("/courses")),
        cts:element-value-query(
          xs:QName("note"), "COGNITIVE   SCIENCE", "case-insensitive")
    ))
  )
for $uri in $uris
let $doc := fn:doc($uri)
let $courseID := fn:data($doc//meta:courseid)
return xdmp:save(fn:concat("/output/",$courseID,".xml"), $doc)

Thanks in advance Thichxai

Upvotes: 0

Views: 215

Answers (2)

BenW
BenW

Reputation: 433

The forthcoming MarkLogic 9 has element-level security, which I think would solve this problem. This article describes how it works.

You'd want to create a user that has read-privileges for the documents as a whole, but not for the instructorName/studentName XPaths. Then run MLCP as that user to dump the documents to files.

Upvotes: 1

ehennum
ehennum

Reputation: 7335

Instead of using cts:uris() followed by fn:doc(), you should just use cts:search() to get the documents matching the query in one pass:

http://docs.marklogic.com/cts:search

Beyond that, I'm not entirely sure which of two goals you're trying to accomplish.

If you want to extract elements other than those elements, use XPath. The specific XPath will depend on the structure of your document, but assuming these are top-level child elements, the approach would be similar to:

for $doc in cts:search(...)
let $root := $doc/*
let $hide := $root/(instructorName|studentName)
let $keep := ($root/node() except $hide)
let $newDoc := document-node{element {node-name($root)} {$keep}}
return xdmp:save(..., $newDoc)

If you want to hide the documents that have those elements, try changing your query to something like:

cts:and-query((
    cts:collection-query("/courses"),
    cts:not-query(
        cts:element-query(
            (xs:QName("instructorName"), xs:QName("studentName")),
            cts:true-query()
            ))
    ))

For more, see:

http://docs.marklogic.com/cts:not-query

Hoping that helps,

Upvotes: 1

Related Questions