Amit Gope
Amit Gope

Reputation: 130

cts:uri-match to pick a particular format

In my MarkLogic Database, we have documents which conform to the URI format in the following manners:

/documents/12345.xml
/documents/12-abc.xml
/documents/abc-123-def.xml
/12345.xml

I want to run a regex in the cts:uri-match to pick only those uri's which conform to the format

> /documents/{integer-values}.xml

Please suggest how to make this work. There are millions of documents in the database, I want to pick only the uris conforming to the above format, will be running a CORB process on those documents for the transformation. I don't want to get all the URI's and then run a fn:matches query to make this work.

Upvotes: 2

Views: 599

Answers (1)

grtjn
grtjn

Reputation: 20414

Unfortunately, cts:uri-match takes a wildcard pattern, not a regex. The closest you can get is with a pattern like "/documents/*.xml". It could trim down the number of results drastically already though, depending on your dataset. You can then filter out false positives with an additional predicate with fn:matches. Something like:

cts:uri-match('/documents/*.xml')[fn:matches(., '^/documents/\d+\.xml$')]

So, perhaps a little less optimal than passing in a regex directly, but better than doing a regex on all uris. It should work just fine with millions of uris.

HTH!

Upvotes: 5

Related Questions