TJ Tang
TJ Tang

Reputation: 931

wildcard searches on specific elements only

I am looking for a way to do wildcard search only on specific elements when executing a search:search. Specifically, I might have documents that look like the following:

<pdbe:person-envelope xmlns:pdbe="http://schemas.abbvienet.com/people-db/envelope">
  <person xmlns="http://schemas.abbvienet.com/people-db/model">
    <costcenter>
      <code>0000601775</code>
      <name>DISC-PLAT INFORM</name>
   </costcenter>
    <displayName>Tj Tang</displayName>
    <upi>10025613</upi>
    <firstName>
      <preferred>TJ</preferred>
      <given>Tze-John</given>
   </firstName>
    <lastName>
      <preferred>Tang</preferred>
      <given>Tang</given>
   </lastName>
    <title>Principal Research Scientist</title>
  </person>
  <pdbe:raw/>
</pdbe:person-envelope>

When searches happen, I want the search text to be automatically wildcarded, but only for certain elements like displayName, firstName, lastName, but NOT for upi or code. As I understand it, I would have certain wildcard related indexes enabled in the database, but then I would need to have a custom query parser that rewrite the query into multiple cts:element-query and cts:element-value-query statements for each element that I want to wildcard search on, OR'd with the originally parsed search query. Or I can create field constraints, and rewrite the query to use field contraints.

Is there another way to conditionally search using wildcard on some elements but not others, when the user is entering as simple search query?, i.e. partial first and last name, "TJ Tan", but no partial hits when I search "100256".

Upvotes: 3

Views: 144

Answers (1)

You are on the right track. Lets take an element (or maybe field) query on "TS Tan"

With cts:tokenize, you can break this up (read about cs:tokenize - it is not just a normal tokenizer).

Then I have "TS" and "Tan"

You can the do things like apply business rules on which word should be wild-carded and which not and build the appropriate cts query (probably individual word queries in an and statement - or a near query - tuning depends on your need).

Now with search phrase tokenized, you can also consider that you may find building your results relies not on a wildcard index, but on a an element word lexicon - where you do your term-expansion with word-matches and those terms are then sent to the query.

We sometimes take that further and combine the query building with xdmp:estimate and make the query less restrictive if we don't get enough results early on.

Where to put this logic? You mention search:search, so in this case, I would suggest you package this into a custom constraint.

Upvotes: 5

Related Questions