Reputation: 8422
We have a requirement to store and retrieve well-formed HTML5 documents in MarkLogic using the Java Client API or REST API.
Each document has an '.html' extension and the standard HTML5 doctype . When documents are inserted, by default they get stored as text documents.
We would like to use all the goodness that MarkLogic provides for search and manipulation of the documents as if they were XHTML, but we need to preserve the HTML5 doctype and .html extension for compatibility with other tools. I am sure we are not the only ones to have encountered this scenario.
We have tried changing the HTML mimetype to XML but when documents are inserted the doctype gets replaced with the XML doctype. Is there a way to insert and retrieve well formed HTML5 documents without losing the doctype?
Upvotes: 2
Views: 335
Reputation: 7335
Expanding a bit on WST's answer, you could store the document as XHTML and do the conversion in a REST API transform with
A possible XQuery transform for the REST API:
xquery version "1.0-ml";
module namespace html5ifier =
"http://marklogic.com/rest-api/transform/html5ifier";
declare default function namespace "http://www.w3.org/2005/xpath-functions";
declare option xdmp:mapping "false";
declare function html5ifier:transform(
$context as map:map,
$params as map:map,
$content as document-node()
) as document-node()
{
map:put($context,"output-type","text/html"),
document{text{
xdmp:quote($content,
<options xmlns="xdmp:quote">
<method>html</method>
<media-type>text/html</media-type>
<doctype-public>html</doctype-public>
</options>)
}}
};
If your REST server was on port 8011, you would install the transform with a PUT request:
http://localhost:8011/v1/config/transforms/html5ifier
Then, you could GET the persisted XHTML document as HTML5 using the transform
http://localhost:8011/v1/documents?uri=/path/to/the/doc.xhtml \ &transform=html5ifier
You could make additional changes to the XHTML document within the transform (either on the XML before quoting or on the string after quoting).
See also:
http://markmail.org/message/qmsos7np64ohyctp
Upvotes: 1
Reputation: 11771
There is no native way to keep the doctype in the database (XQuery doesn't support doctypes). But using some logic you could add add the doctype back when a document is requested.
For example:
declare function local:get-with-doctype(
$document as document-node()
) as xs:string
{
if (ends-with(xdmp:node-uri($document), '.html')
then document {
text{ '<!DOCTYPE html>' },
xdmp:quote($document)
}
else $document
};
Alternatively, you could parse the doctype out of the document when it's inserted and store it in a document property. Then when the document is requested, you could always add the one from the property. However, that would probably only be worth it if you were required to handle many doctypes.
Upvotes: 1