Reputation: 680
I'm using MarkLogic 7.0, java-client-api 2.0.5.
Let's say, I have the following document
{"id": 233457657, "message": "8=FIX.4.3 9=118 35=A 34=1 49=ABCMKD"}
I want to get it selected when passing for example "34=1 49=ABC" to the query along with other documents that have message value with such substring. How should I build the query?
I tried the following, but it seems to do not what I expect:
new StructuredQueryBuilder().term("*" + substring + "*");
I know that it may be very basic question but I'm pretty new with ML and a bit confused with the docs. Thank you!
Upvotes: 1
Views: 1255
Reputation: 504
If you store your documents in the native json xml form in MarkLogic, which make them look like this:
<?xml version="1.0" encoding="UTF-8"?> <json type="object" xmlns="http://marklogic.com/xdmp/json/basic"> <id type="number">233457657</id> <message type="string">8=FIX.4.3 9=118 335=A 34=1 49=ABCMKG</message> </json>
you could query this with the following in QConsole:
xquery version "1.0-ml";
import module namespace json = "http://marklogic.com/xdmp/json"
at "/MarkLogic/json/json.xqy";
declare namespace jbasic = "http://marklogic.com/xdmp/json/basic";
cts:search(fn:doc(),cts:and-query((
cts:element-word-query(xs:QName("jbasic:message"),"35=A","wildcarded"),
cts:element-word-query(xs:QName("jbasic:message"),"49=ABC*","wildcarded")
)))
In the Tutorial section on docs.marklogic.com you can find samples on how to do that with the JAVA api. Check this link: https://developer.marklogic.com/learn/java/custom-search#search-using-an-element-word-constraint.
HTH,
Peter
Upvotes: 1
Reputation: 7335
The universal index stores the words from the values of properties. By default, a word is a contiguous run of numbers or letters (ie, delimited by spaces or punctuations).
In the example above, the default words in the index for the message property would be:
8, FIX, 4, 3, 9, 118, 35, 1, 49, ABCMKD
You can query for the and-related combination of such words within the message property, as in:
qb.and(
qb.word(qb.jsonProperty("message"), "34"),
qb.word(qb.jsonProperty("message"), "1"),
qb.word(qb.jsonProperty("message"), "49"),
qb.word(qb.jsonProperty("message"), "ABCMKD")
)
That's likely to match many messages that don't interest you.
The first step to make these documents queryable is to expose the fields hidden within the message string as JSON properties:
{"id": 233457657, "message": {
"m8": "FIX.4.3",
"m9": "118",
"m35": "A",
"m34": "1",
"m49": "ABCMKD"
}}
You can then match those properties individually with qb.value() queries.
If necessary, you can configure the database to support wildcards on the values of those properties.
Finally, if at all possible, you should upgrade to MarkLogic 8, which has native support for JSON. MarkLogic 6 and 7 support JSON through an internal transform to XML, which does not perform as well.
Addition to respond to the comment ...
If the and-related word queries shown above have infrequent false positives, you could filter them out on the client side -- that is, query for a larger page of documents than you need, inspect the message property on the client with a regex, and throw away the documents that you don't need.
If there are too many false positives for that approach, you could use the Admin UI to create a field index on the message XML element that's stored for the JSON property. (Use Query Console to explore the database and find out the namespace for the XML.) In the Field index, turn off stemmed, phrased, case sensitive, and diacritic sensitive searches. Turn on word, trailing wildcard, and one character searches. (Don't turn on positions.)
After reindexing, do the word queries on the field, similar to:
String wordQueryOptions =
{"punctuation-sensitive", "space-sensitive", "wildcarded"};
...
qb.and(
qb.word(qb.field("yourMsgField"), FragmentScope.Documents,
wordQueryOptions, "34=1"),
qb.word(qb.field("yourMsgField"), FragmentScope.Documents,
wordQueryOptions, "49=ABC*")
)
Finally, modify the query options used for your query to specify "filtered" as a search option.
Hoping that helps,
Upvotes: 4