Reputation: 2852
Following is the XML Structure -
<Docs>
<Doc>
<Name>Doc 1</Name>
<Notes>
<specialNote>
This is a special note section.
<B>This B Tag is used for highlighting any text and is optional</B>
<U>This U Tag will underline any text and is optional</U>
<I>This I Tag is used for highlighting any text and is optional</I>
</specialNote>
<generalNote>
<P>
This will store the general notes and might have number of paragraphs. This is para no 1. NO Child Tags here
</P>
<P>
This is para no 2
</P>
</generalNote>
</Notes>
<Desc>
<P>
This is used for Description and might have number of paragraphs. Here too, there will be B, U and I Tags for highlighting the description text and are optional
<B>Bold</B>
<I>Italic</I>
<U>Underline</U>
</P>
<P>
This is description para no 2 with I and U Tags
<I>Italic</I>
<U>Underline</U>
</P>
</Desc>
</Doc>
There will be 1000's of Doc
Tags. I want to give user a search criteria, where he can search WORD1
and NOT WORD2
. Following is the query -
for $x in doc('Documents')/Docs/Doc[Notes/specialNote/text() contains text 'Tom'
ftand ftnot 'jerry' or
Notes/specialNote/text() contains text 'Tom' ftand ftnot 'jerry' or
Notes/specialNote/B/text() contains text 'Tom' ftand ftnot 'jerry' or
Notes/specialNote/I/text() contains text 'Tom' ftand ftnot 'jerry' or
Notes/specialNote/U/text() contains text 'Tom' ftand ftnot 'jerry' or
Notes/generalNote/P/text() contains text 'Tom' ftand ftnot 'jerry' or
Desc/P/text() contains text 'Tom' ftand ftnot 'jerry' or
Desc/P/B/text() contains text 'Tom' ftand ftnot 'jerry' or
Desc/P/I/text() contains text 'Tom' ftand ftnot 'jerry' or
Desc/P/U/text() contains text 'Tom' ftand ftnot 'jerry']
return $x/Name
The result of this query is wrong. I mean, the result contains some doc with both Tom
and jerry
. So I changed the query to -
for $x in doc('Documents')/Docs/Doc[. contains text 'Tom' ftand ftnot 'jerry']
return $x/Name
This query gives me the exact result, ie; only those docs with Tom
and Not jerry
, BUT IS TAKING HUGE TIME... Approx. 45 secs, whereas the earlier one took 10 secs !!
I am using BaseX 7.5 XML Database.
Need expert comments on this :)
Upvotes: 4
Views: 522
Reputation: 4241
The first query tests each text node in the document separately, so <P><B>Tom</B> and <I>Jerry</I></P>
would match because the first text node contains Tom but not Jerry.
In the second query the full-text search is performed on all the text contents of the Doc
elements as if they were concatenated into one string. This cannot (currently) be answered by BaseX's fulltext index, which indexes each text node separately.
A solution would be to perform the fulltext searches for each term separately and merging the results in the end. This can be done for each text node separately, so the index can be used:
for $x in (doc('Documents')/Docs/Doc[.//text() contains text 'Tom']
except doc('Documents')/Docs/Doc[.//text() contains text 'Jerry'])
return $x/Name
The above query is rewritten by the query optimizer to this equivalent one using two index accesses:
for $x in (db:fulltext("Documents", "Tom")/ancestor::*:Doc
except db:fulltext("Documents", "Jerry")/ancestor::*:Doc)
return $x/Name
You can even tweak the order in which you are merging the results in order to keep intermediate results small if you want.
Upvotes: 4