Reputation: 7133
I did not get THIS example that why the following two queries returned 12 as result count? Post explains saying that the way data is indexed in _all
is different. But it does not go about explaining it. Can someone please help me understand this.
GET /_search?q=2014 # 12 results
GET /_search?q=2014-09-15 # 12 results !
Upvotes: 0
Views: 963
Reputation: 2911
Suppose you have a documents like this:
{
"name": "John Doe",
"occuptation": "Farmer",
"favorite_ice_cream": "chocolate"
}',
{
"name": "Jane Doe",
"occuptation": "Doctor",
"favorite_ice_cream": "vanilla"
}'
And also suppose that the favorite ice cream field is non analyzed. Non analyzed fields are highly cachable and easy to perform aggregations on (so it's very easy to count how many people like chocolate ice cream, for example, vs vanilla). But non analyzed fields are not searchable by default.
But... by default Elasticsearch takes all the fields in a document, crams them together into an _all field, and analyzes them in Lucene. So, for the first document, Elastic will analyze the string "John Doe Farmer chocolate" and for the second field, Elasticsearch will analyze "Jane Doe Doctor vanilla." As a consequence of this, when you submit a query like the one you did above, you can (for example) search for GET /_search?q=chocolate
and see that John Doe likes chocolate ice cream. You could also submit a Query string query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html) to search the _all field and still figure out who likes chocolate. You could not, however, use a match query against the favorite ice cream field... after all, we told Lucene not to analyze that field. You could, however, use a filter on the field and bring back all the documents for which favorite_ice_cream is equal to chocolate.
It's kind of rough to get used to at first... but the docs are good and not confusing so long as you make sure you keep an eye on which version of the docs you're reading.
Also, if it helps, I like to think of the _all field as sort of like get out of jail free card. A lot of times I might choose not to analyze a field because I'll want to run aggregations on it or apply filters. And while I usually recall which value I need for a filter, sometimes it's useful to be able to submit a search to the _all field and make sure... So for example if I can't recall if my "country" field has "United States" or "Unites States of America" as the value for the US, I can quickly perform a query against the _all field, look at a few documents, and then pick the appropriate filter value.
Another way that I've used the _all field is in full text search in which I want to boost matches on certain fields much higher, but I also want to search all of the fields in the document in case something happens to match. A query string query against _all works great in those circumstances.
You can learn more about the _all field here: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
Hope this is enough to get you started... you probably don't want to submit simple queries the way you are. Probably, you're gonna want to submit POST requests that utilize the full query DSL. You can learn more about that here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html
Best of luck!
Upvotes: 3