Reputation: 23
I use a WebDAV server to import PDFs, DOCs, PPTXs and XLSXs to my database by drag and drop. My WebDAV server is called "CPF", his root is "/" and his port number is "9999".
And I installed the Content Processing Framework with the standard configuration.
Can it be that I have not the needed security requirements?
For this case MarkLogic says:
Set the Needed Permissions on the Root Directory
When you add documents to the database for conversion, the user who adds the documents must have the needed permissions to add and modify documents. If you are using WebDAV server to drag-and-drop documents into the database, the root directory of the WebDAV server must also have the needed permissions.
One simple way to accomplish these security requirements is to do the following:
Create a URI privilege for the URI that is configured as the root directory of your WebDAV server.
Create a role that has the URI privilege and has default permissions of read. insert, and update for the role.
Set the permissions on the WebDAV root directory for the role you created. For example, if the role you created is named webdav, and the root directory has the URI /webdav/root/, run a query (as a privileged user) similar to the following:
xdmp:document-set-permissions("/webdav/root/",
( xdmp:permission("webdav", "read"),
xdmp:permission("webdav", "insert"),
xdmp:permission("webdav", "update") ) )
You can check the permissions with the following query:
xdmp:document-get-permissions("/webdav/root/")
• Grant the new role (webdav in the example above) to the user who accesses the WebDAV server.
In this case I don't get which "role" and which "root directory" they are talking about?
But what if the error comes from somewhere else? Why do I have some documents converted into .xml files and others into .xhtml files and about 50% of my original files ignored and not converted?
As suggested by Dave Cassel, I ran xdmp:document-properties()
for one of the records that had failed to process. Below is the result:
<?xml version="1.0" encoding="UTF-8"?>
<prop:properties xmlns:prop="http://marklogic.com/xdmp/property">
<cpf:processing-status xmlns:cpf="http://marklogic.com/cpf">done</cpf:processing-status>
<cpf:property-hash xmlns:cpf="http://marklogic.com/cpf">93bdf4b50736752e0155c8e16fd42544</cpf:property-hash>
<cpf:last-updated xmlns:cpf="http://marklogic.com/cpf">2016-07-25T11:26:13.006+02:00</cpf:last-updated>
<cpf:state xmlns:cpf="http://marklogic.com/cpf">http://marklogic.com/states/property-updated</cpf:state>
<cpf:self xmlns:cpf="http://marklogic.com/cpf">/XXX/PDFs/XXXXX.pdf</cpf:self>
<Win32CreationTime xmlns="urn:schemas-microsoft-com:">Mon, 25 Jul 2016 08:05:44 GMT</Win32CreationTime>
<Win32LastAccessTime xmlns="urn:schemas-microsoft-com:">Mon, 25 Jul 2016 09:26:12 GMT</Win32LastAccessTime>
<Win32FileAttributes xmlns="urn:schemas-microsoft-com:">00000000</Win32FileAttributes>
<Win32LastModifiedTime xmlns="urn:schemas-microsoft-com:">Mon, 25 Jul 2016 08:05:44 GMT</Win32LastModifiedTime>
</prop:properties>
Upvotes: 0
Views: 117
Reputation: 8422
CPF stores information about state changes and errors in a document's properties. To diagnose what's going, go to query console and run xdmp:document-properties() on one of the documents that did not get processed. That will likely tell you what the error is.
Looking at the properties that you've added, I see that the state is http://marklogic.com/states/property-updated
and I see a set of Microsoft properties. Looking at the pipelines you get when you go through the Admin UI's Content Processing Install tab, that state appears to be a dead end -- ie, no other pipeline uses that state as a starting point. So you have any other processing in place that creates those Microsoft properties?
Upvotes: 4