Reputation: 3299
I'm using ElasticSearch to index forum threads and reply posts. Each post has a date field associated with it. I'd like to perform a query that includes a date range which will return threads that contain posts matching a date range. I've looked at using a nested mapping but the docs say the feature is experimental and may lead to inaccurate results.
What's the best way to accomplish this? I'm using the Java API.
Upvotes: 4
Views: 20848
Reputation: 17319
You haven't said much about your data structure, but I'm inferring from your question that you have post
objects which contain a date
field, and presumably a thread_id
field, ie some way of identifying which thread a post belongs to?
Do you also have a thread
object, or is your thread_id
sufficient?
Either way, your stated goal is to return a list of threads which have posts in a particular date range. This means that you need to group your threads (rather than returning the same thread_id
multiple times for each post in the date range).
This grouping can be done by using facets.
So the query in JSON would look like this:
curl -XGET 'http://127.0.0.1:9200/posts/post/_search?pretty=1&search_type=count' -d '
{
"facets" : {
"thread_id" : {
"terms" : {
"size" : 20,
"field" : "thread_id"
}
}
},
"query" : {
"filtered" : {
"query" : {
"text" : {
"content" : "any keywords to match"
}
},
"filter" : {
"numeric_range" : {
"date" : {
"lt" : "2011-02-01",
"gte" : "2011-01-01"
}
}
}
}
}
}
'
Note:
search_type=count
because I don't actually want the posts returned, just the thread_id
sthread_id
s (size: 20
). The default would be 10numeric_range
for the date
field because dates typically have many distinct values, and the numeric_range
filter uses a different approach to the range
filter, making it perform better in this situationthread_id
s look like how-to-perform-a-date-range-elasticsearch-query
then you can use these values directly. But if you have a separate thread
object, then you can use the multi-get API to retrieve thesethread_id
field should be mapped as { "index": "not_analyzed" }
so that the whole value is treated as a single term, rather than being analyzed into separate termsUpvotes: 12