Reputation: 57
I have around 2 millions records, each record having 10-12 fields(mostly are string). Now I want to filter records on the basis of some field. Is it advisable to do this using secondary index or some other better option is available? Also, how much time would it take to get all the records/just keys (after applying the filter)?
Thanks in advance.
Upvotes: 2
Views: 327
Reputation: 5435
You can do a scan with predicate filter - which is quite versatile (you can even do regex) or secondary index query which only honors equality filter on strings.
Scans are more reliable and will be even better in the next upcoming release (Mar/Apr 2020) in terms of managing their progress. Scans do require reading all records from disk first and then applying the filter.
SI will be faster because you are filtering (in-memory secondary index) before you fetch the record from disk but less reliable if underlying cluster nodes are not stable - i.e. if you lose or add a node during SI query. The query runs in parallel on all cluster nodes and pipelines the results back to the client in no particular order. You can mitigate that by using "failOnClusterChange" option and restarting when cluster is stable. (Scans also have the same option available.)
Which is better? do A/B test on your specific problem.
Upvotes: 3