Reputation: 105
I have a number of documents in Cloudant, that have ID field of type string. ID can be a simple string, like "aaa", "bbb" or number stored as text, e.g. "111", "222", etc. I need to be able to full text search using the above field, but I encountered some problems.
Assuming that I have two documents, having ID="aaa" and ID="111", then searching with query:
returns first document, as expected
returns nothing, but
returns second document, so at least there is a way to retrieve it.
Unfortunately, when searching for range:
I get no results, and I have no idea what to do to get around this problem. Is there any special syntax for such case?
UPDATE:
Index function:
function(doc){
if(!doc.ID) return;
index("ID", doc.ID, { index:'not_analyzed_no_norms', store:true });
}
Changing index to analyzed doesn't help. Analyzer itself is keyword, but changing to standard doesn't help either.
UPDATE 2
Just to add some more context, because I think I missed one key point. The field I'm indexing will be searched using ranges, and both min and max values can be provided by user. So it is possible that one of them will be number stored as a string, while other will be a standard non-numeric text. For example search all document where ID >= "11" and ID <= "foo".
Assumig that database contains documents with ID "1", "5", "alpha", "beta", "gamma", this query should return "5", "alpha", "beta". Please note that "5" should actually be returned, because string "5" is greater than string "11".
Upvotes: 2
Views: 1267
Reputation: 405
This could also be achieved using regular expressions in queries. Something line this:
curl -X POST "https://.../facetrangetest/_design/ddoc/_search/f" -d '{"q":"ID:/<23-758>/"}' | jq .
This regular expressions means to retrieve all documents with ID field from 23 to 758. Slashes: /
/
are used to enclose a regular expression; the interval is enclosed inside <>
.
Upvotes: 1
Reputation: 405
First of all, I would suggest to use keyword analyzer, so you can control the right tokenization during both indexing and search.
"analyzer": "keyword",
"index": "function(doc){\n if(!doc.ID) return;\n index(\"ID\", doc.ID, {store:true });\n}
To retrieve you document with _id "111", use the following range query:
curl -X GET "http://.../facetrangetest/_design/ddoc/_search/f?q=ID:\[111%20TO%A\]"
If you use a query q=ID:\[111%20TO%20999\]
, Cloudant search seeing numbers on both size of the range, will interpret it as NumericRangeQuery; and since your ID of "111" is a String, it will not be part of the results returned. Including a string into query [111%20TO%20A], will make Cloudant interpret it as a range query on strings.
Upvotes: 2
Reputation: 105
Our team just came to a workaround solution. We managed to get proper results by adding some arbitrary character, e.g. 'a' to an upper range value, and by introducing additional search term, to exclude documents having ID between upper range value and upper range value + 'a'.
When searching for a range
ID:[X TO Y]
actual query would be
(ID:[X TO Ya] AND -ID:{Y TO Ya])
For example, to find a documents having ID between 23 and 758, we execute
(ID:[23 TO 758a] AND -ID:{758 TO 758a]).
Upvotes: 2
Reputation: 544
You can get both docs returned like this:
q=ID:["111" TO "CCC"]
Here's a working live example:
I found something quirky. It seems that range queries on strings only work if at least one of the range values is a string. Querying on ID:["111" TO "555"]
doesn't return anything either, so maybe this is resolving to a numeric query somehow? Could be a bug.
Upvotes: 1