John
John

Reputation: 2852

XQuery Full text searching over mixed content

Following is the XML structure - (i have given a very small piece of the entire document with limited data. I have a XML DB of 6 GB, with proper FULL TEXT Index.)

<Docs>
 <Doc>
<Chap>No - 1</Chap>
<Desc>
  <Notes>
    <Para t="sn">departmental report</Para>
  </Notes>
  <Notes>
    <Para t="sn">The equiry commission is good.</Para>
  </Notes>
  <Notes>
    <Para t="sn">departmental process</Para>
    <Para t="ln">The enquiry report for the bomb blast is yet to come.<bL/>
      <bL/>The department working on this is quite lazy.</Para>
  </Notes>
</Desc>
</Doc>
<Doc>
<Chap>No - 2</Chap>
<Desc>
  <Notes>
    <Para t="sn">Enquiry Processes Report</Para>
    <Para t="ln">The enquiry process is very simple.<bL/>
      <bL/>With proper guidance anybody can handle the commission easily.<bL/>
      <bL/>
    </Para>
  </Notes>
  <Notes>
    <Para t="sn">Enquiry - Departmental</Para>
  </Notes>
</Desc>
 </Doc>
 <Doc>
<Chap>No - 3</Chap>
<Desc>
  <Notes>
    <Para t="sn">Physics Department</Para>
  </Notes>
  <Notes>
    <Para t="sn">Working process of physics department is quite lengthy</Para>
    <Para t="ln">Even after proper enquiry, I was told nothing.<bL/>
      <bL/>This was like a bomb blast.</Para>
  </Notes>
  <Notes>
    <Para t="sn">Departmental enquiry.</Para>
    <Para t="ln">There should be a departmental enquiry for this wrong process.</Para>
  </Notes>
</Desc>
</Doc>
</Docs>

Now I want all those Chap nodes containing all words "departmental", "enquiry" and "report".

So far, I am unable to get them using various combinations. One of my try is -

for $x in ft:search("Docs", ("departmental enquiry report"), map{'mode':='all words'})/ancestor::*:Para
 return $x/ancestor::Chap

Can any body guide me on this ?

Upvotes: 3

Views: 592

Answers (2)

Christian Gr&#252;n
Christian Gr&#252;n

Reputation: 6229

The full-text index of BaseX references all terms on text node level. This means that all of your words would need to occur in the same text node.

If you want to take advantage of the full-text query and find all words that occur below a certain element, you could try the following query:

let $words := ("departmental enquiry report")
for $doc in db:open("Docs")//Doc[.//text() contains text { $words } any word]
where $doc[string-join(.//text(), ' ') contains text { $words } all words]
return $doc/Chap

The first contains text expression will be rewritten to an index request. It will return all texts that return any of the searched words. The contains text expression in the where clause will filter out all nodes that do not contain all of your query terms. With string-join(.//text(), ' '), all text nodes below the Doc element will be concatenated, and the search will be performed on the joined string.

The folowing, equivalent representation of the query should yield the same results:

let $words := ("departmental enquiry report")
for $x in ft:search("Docs", $words, map { 'mode': 'any word' })/ancestor::*:Doc
where ft:contains(string-join($x//text(), ' '), $words, map { 'mode': 'all words' })
return $x/Chap

Upvotes: 1

Jens Erat
Jens Erat

Reputation: 38682

ft:search, and why it Will not Solve the Issue

By looking at BaseX' XQuery Full Text Documentation you will realize that the second argument in ft:search should be a sequence of words:

ft:search($db as xs:string, $terms as item()*, $options as item()) as text()*

So, your query should look something like

for $x in ft:search("Docs", ("departmental", "enquiry", "report"), map{'mode':='all words'})/ancestor::*:Para
return $x/ancestor::Chap

Yet this still will not solve your issue, as this function

[re]turns all text nodes from the full-text index of the database $db that contain the specified $terms.

In other words: all of these words would have to occur in a single text node, but they're spread over multiple in your example input (all over a <Doc/> node).

Using Standard XQuery Full Text

I had to guess from the input and words you're searching for that you actually want to search for <Doc/> nodes that contain all these three words.

for $document in doc("Docs")/Docs/Doc
where $document contains text { 'departmental', 'enquiry', 'report' } all words
return $document/Chap

This will retrieve all documents, apply a full text search on it and finally return the document's chapter node.

Be aware

  • I removed the namespace wildcard, as no namespaces are included in your example document and
  • to create a full text index (if you didn't do yet) which will highly increase performance.

Upvotes: 1

Related Questions