Reputation: 962
This is the second question related to MarkLogic content pump utility.
I am ingesting a single aggregated XML document with multiple records into MarkLogic Content pump. I expect the the aggregate XML document to be transformed to a different format and also the content pump utility to generate multiple xml document from a single input large xml document.?
Example: Aggregated input xml document:
<root>
<data>Bob</data>
<data>Vishal></data>
</root>
Expected Output from content pump : Two documents with a different format:
Document 1 :
<data1>Bob</data1>
Document 2
<data1>Vishal</data1>
I am using following XSLT to split the above document into two nodes:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:template match="root">
<xsl:apply-templates select="data"></xsl:apply-templates>
</xsl:template>
<xsl:template match="data">
<data1><xsl:value-of select="."/></data1>
</xsl:template>
</xsl:stylesheet>
output:
<?xml version="1.0" encoding="UTF-8"?>
<data1>Bob</data1>
<data1>Vishal</data1>
Following is the XQuery transform, which calls the above the "XSLT file" to generate two nodes:
xquery version "1.0-ml";
module namespace example = "http://marklogic.com/example";
declare function example:transform(
$content as map:map,
$context as map:map
) as map:map*
{
let $attr-value :=
(map:get($context, "transform_param"), "UNDEFINED")[1]
let $the-doc := map:get($content, "value")
let $let-output:= xdmp:xslt-invoke("/marklogic.rest.transform/simple-xsl/assets/transform.xsl", $the-doc )
return (map:put(
$content, "value",
$let-output
),$content)
};
The above XQuery transforms fails and returns a error. So, how do I modify the above XQuery program so that it generates and indexes multiple transformed XML documents from a single document?
MLCP Command:
mlcp.sh import -host localhost -port 8040 \
-username admin -password admin \
-input_file_path ./parent-form.xml \
-transform_module /example/parent-transform.xqy \
-transform_namespace "http://marklogic.com/example" \
-transform_param "my-value" \
-output_collections people \
-output_permissions my-app-role,read,my-app-role,update
Upvotes: 1
Views: 436
Reputation: 20414
The transform you provided returns a single document containing multiple root elements. The transform will work, but MarkLogic will not allow inserting that into the database, and throw an XDMP-MULTIROOT: Document nodes cannot have multiple roots
.
There are two ways to solve that. The simplest is to use /*
behind the xdmp:xslt-invoke
. The other solution is to use <xsl:result-document href="{generate-id()}.xml">
inside your XSLT. Both will cause $let-output
to contain a sequence instead of just a single document.
However, without further changes that will result in XDMP-CONFLICTINGUPDATES
, as this would write multiple results at one database uri. To solve that you can clone the $content map:map
with a small trick, and provide separate uris. For instance like this:
for $let-output at $i in xdmp:xslt-invoke("/marklogic.rest.transform/simple-xsl/assets/transform.xsl", $the-doc )/*
let $extra-content := map:map(document{$content}/*)
let $_ := map:put($extra-content, "value", $let-output)
let $_ := map:put($extra-content, "uri", concat($the-uri, '-', $i, '.xml') )
return
$extra-content
Note: the transform function has a return type of map:map*
, meaning you can return zero or more map:map's containing result.
HTH!
Upvotes: 3
Reputation: 7770
You cannot use the transform function to actually split your document. Instead, that is called per document being ingested.
The creating of individual documents is done prior to ingestion and is controlled by the aggregate_ flags.
https://docs.marklogic.com/guide/ingestion/content-pump#id_65814
Upvotes: 1