Reputation: 177
We need to add a collection to every document we ingest and would like to use CPF as the collection will depend on data in the documents.
Our document URIs are: GUID.xml no forward-slash or directory in front of them.
We have attempted to get CPF to trigger using:
document scope = directory
uri = /
and
document scope = document
uri = /
Our theory is that CPF is expecting the document URIs to begin with a forward-slash but since they do not CPF is not triggering.
We have considered using a crude solution of adding a basic collection to every document and using that as the document scope for CPF, but obviously this is unnecessary data being added which we don't need.
We would be grateful for any ideas or solutions.
Upvotes: 5
Views: 106
Reputation: 7770
I support Dave's post as the appropriate answer. However, for completeness, I have included a deeper dive into the way CPF and triggers work together and how you can accomplish what you want with the tools available. But even though you 'can' accmoplish it, slashes would just be more tidy.
I agree with Dave's suggestion about the '/'. Starting with a slash is a good thing in ML (not required, of course). However, I have run across a handful of things that assume a slash is there at the start.
But, that takes changing what you have - and MarkLogic has lots of goodies under the hood, so we can have abit of a rummage around and see what other bags of tricks we can try.
What is CPF? It's an application built on top of triggers that does some really convenient things and is highly configurable for pipelines. - The important thing is that under the hood, it runs on triggers. - And triggers require a scope (no wildcard or empty scope).
Now MarkLogic is very clear in that a 'directory' ends with a slash. And that applies to the root directory. So there is no way to use directory scope or document scope in your example based on your URIS.
But we still have trgr:collection-scope() to play with. How? Well, that's easy: - For whatever user is used to insert the documents, add a default collection (under the admin panel) -> security ->users [or roles] - Then define the trigger.
For me, I added a default collection called 'default' and the following trigger definition:
trgr:create-trigger("myTrigger2", "Simple trigger example",
trgr:trigger-data-event(
trgr:collection-scope("default"),
trgr:document-content("create"),
trgr:post-commit()),
trgr:trigger-module(0, "/dae/", "log.xqy"),
fn:true(), xdmp:default-permissions() )
So, now the documents can be triggered because the user inserting them has already defined a collection (which is already available by the time this trigger runs). Yes, this is the business end of CPF - the collection scope is the same as in CPF configurations.
In essence, the documents are scoped to a collection by the user used to insert them (a default collection). With that, you can say that the trigger is also scoped to that user's inserts. And in MarkLogic, you ALWAYS have a user - even if its a default one.
The path is this: document ->inserted with default collection -> trigger is triggered because collection scope is defined for the default collection.
Upvotes: 4
Reputation: 8422
I found this in the Domain Scope section of the CPF Guide:
In the Admin Interface, the document scope drop-down list specifies whether the domain applies to a single document, a directory, or a collection. Each domain can only have one of these document scopes; if you need more than one of these document scopes, you can create multiple domains.
In other words, to use CPF on more than one document, you'll need to use either directories or a collections to organize your data a bit.
Also note that the CPF guide says "Do not overlap domains". This means that if you ever wanted CPF to target content more precisely, you'll want a more specific organization of your content anyway (different directories for different types or sources of content, for instance). This can also be helpful during searches. Without knowing anything about your data, I can't suggest how you might break it up, but some kind of organization is typically helpful.
You'll need to do something different in your data load. I think your path of least resistance is to add a / to the beginning of your URIs.
Upvotes: 5