Reputation: 1077
Is it possible to load a text file, regardless of its content, as a binary document through the MarkLogic REST APIs? More specifically through a resource extension end point?
I see it is possible through the xdmp:document-load
function but not quite sure how to do it using the REST APIs.
xdmp:document-load("C:\my\path\test.txt",
map:map() => map:with("uri", "/test/test.txt")
=> map:with("format", "binary")
)
I have tried to load the same document through the PUT /v1/documents
API and set the format
parameter to be binary
. But it was still loaded as a text
file.
The use case is that I need to ingest a bunch of attachment files which occasionally include some text files. I don't need MarkLogic to index their content and in fact many of those files have encoding or format issues if MarkLogic attempts to do so.
Thank you!
Upvotes: 1
Views: 108
Reputation: 66783
With /v1/documents PUT, the format
parameter is used to indicate the format of the metadata, not the document.
As described in Controlling Input and Output Content Type
- Primary: URI extension MIME type mapping, as long as the request does not specify a transform function.
- Fallback: Content-type header MIME type mapping. For multipart input, the request Content-type header must be multipart/mixed, so the Content-type header for each part specifies the MIME type of the content for that part.
The resource file extension from the document URI is used to look for a configured Mimetype. It will use the format
for the configured Mimetype, if there is a matching entry.
Unfortunately, the explicit Content-type
header does not override the implicit format
determination. So, if you want to load document that have a .txt
file extension as binary()
documents then you will need to implement some workarounds.
In order to load the text documents as binary()
with /v1/documents PUT
you could:
/myTextFile.txt.bin
. That may not be desired, since it does change the URI of the documents from what it really is, but does indicate that the text doc is being stored as a binary document.Content-type
An example of a passthrough transform that could be applied, so that the implicit URL format
detection is not applied, and the explicit Content-type
header is applied:
function noop(context, params, content){
return content;
}
exports.transform=noop
After installing the custom transform with the name noop
:
Below is an example curl command that installs the noop transform. Update the username/password as appropriate:
curl --anyauth --user myUsername:myPassword -X PUT -i -d "function noop(context, params, content){return content;} exports.transform=noop" -H "Content-type: application/vnd.marklogic-javascript" http://localhost:8000/LATEST/config/transforms/noop
It is then possible to invoke /v1/documents PUT
and specify Content-type
as a binary Mimetype (in this example, as application-octet-stream
):
curl --anyauth --user myUsername:myPassword -T ./test.txt -i -H "Content-type: application/octet-stream" "http://localhost:8000/v1/documents?uri=/test.txt&transform=noop"
and it will be loaded as binary()
instead of text()
doc("/test.txt")/node()/xdmp:node-kind(.)
yields: binary
Upvotes: 3