Reputation: 1
How can I find an exact expression in all documents in Vespa?
I was trying to find a citation of a specific document using an exact expression but got 0 results. I have several documents with text containing the expression "Document 23/2010".
I tried running the following query:
vespa query 'yql=select title from Documents where text contains "\"Document 23/2010\"" LIMIT 10'
and also tried using grammar: phrase
Upvotes: 0
Views: 163
Reputation: 3184
The above should just work for string fields with index
vespa query 'yql=select * from doc where text contains "\"Document 23/2010\""'
{
"root": {
"id": "toplevel",
"relevance": 1.0,
"fields": {
"totalCount": 1
},
"coverage": {
"coverage": 100,
"documents": 1,
"full": true,
"nodes": 1,
"results": 1,
"resultsFull": 1
},
"children": [
{
"id": "id:doc:doc::1",
"relevance": 0.15974580091895013,
"source": "text",
"fields": {
"sddocname": "doc",
"documentid": "id:doc:doc::1",
"text": [
"Foo Bar \"Document 23/2010\" Bar"
]
}
}
]
}
}
If you add trace.level=3
you will see how any query is parsed and executed against the back.
{
"message": "sc0.num0 dispatch response: Result (1 of total 1 hits)"
},
{
"message": "sc0.num0 fill to dispatch: query=[text:'document 23 2010'] timeout=9998ms offset=0 hits=10 groupingSessionCache=true sessionId=5304f5d0-6cd3-4dc4-be2e-666829413231.1708027994706.5.default grouping=0 : restrict=[doc] summary=[null]"
},
{
"message": "Current state of query tree: SPHRASE[explicit=false index=\"text\" isFromQuery=true isFromUser=true locked=true rawWord=\"\\\"Document 23/2010\\\"\" stemmed=true uniqueID=1]{\n WORD[fromSegmented=false index=\"text\" origin=null segmentIndex=0 stemmed=true words=true]{\n \"document\"\n }\n WORD[fromSegmented=false index=\"text\" origin=null segmentIndex=0 stemmed=true words=true]{\n \"23\"\n }\n WORD[fromSegmented=false index=\"text\" origin=null segmentIndex=0 stemmed=true words=true]{\n \"2010\"\n }\n}\n"
},
{
"message": "YQL+ representation: select * from doc where text contains ({origin: {original: \"\\\"Document 23\\/2010\\\"\", offset: 0, length: 18}, id: 1}phrase(\"document\", \"23\", \"2010\")) timeout 9998"
},
Here we can see that the query uses phrase search.
With index
, chars like " are not searchable; they are removed by the tokenizer.
Upvotes: 2