Reputation: 2570
I know the title might suggest it is a duplicate but I haven't been able to find the answer to this specific issue:
I have to filter search results based on a date range. Date of each document is stored (but not indexed) on each one. When using a Filter I noticed the filter is called with all the documents in the index.
This means the filter will get slower as the index grows (currently only ~300,000 documents in it) as it has to iterate through every single document.
I can't using RangeQuery since the date is not indexed.
How can I apply the filter AFTER only on the documents that are the results of the query to make it more efficient?
I prefer to do it before I am handed the results not to mess up the scores and collectors I have.
Upvotes: 1
Views: 2339
Reputation: 3941
Not quite sure if this will help, but I had a similar problem to yours and came up with the following (+ notes):
So, to summarise: index your date fields as numeric fields; build your queries as numeric range queries; transform these into cached filter wrappers and hang onto them.
I think you'll see some spectacular speedups over your current index usage.
Good luck!
p.s. I would never second guess what'll be fast or slow when using Lucene. I've always been surprised in both directions!
Upvotes: 3
Reputation: 6928
First, to filter on a field, it has to be indexed.
Second, using a Filter is considered to be the best way to restrict the set of document to search on. One reason for this is that you can cache the filter results to be used for other queries. And the filter data structure is pretty efficient: it is a bit set of documents matching the filter.
But if you insist on not using filters, I think the only way is to use a boolean query to do the filtering.
Upvotes: 1