solr spatial bad performance

Question

I'm using SOLR-3.4, spatial filtering with the schema having LatLonType (subType=tdouble). I have an index of about 20M places. My basic problem is that if I do bbox filter with cache=true, the performance is reasonably good (~40-50 QPS, about 100-150ms latency), but a big downside is crazy fast old gen heap growth ultimately leading to major collections every 30-40 minutes (on a very large heap, 25GB). And at that point performance is beyond unacceptable. On the other hand I can turn off caching for bbox filters, but then my latency and QPS drops (the latency goes down from 100ms => 500ms). The NumericRangeQuery javadoc talks about the great performance you can get (sub 100 ms) but now I wonder if that was with filterCache enabled, and nobody bothered to look at the heap growth that results. I feel like this is sort of a catch-22 since neither configuration is really acceptable.

I'm open to any ideas. My last idea (untried) is to use geo hash (and pray that it either performs better with cache=false, or has more manageable heap growth if cache=true).

EDIT:

Precision step: default (8 for double I think)

System memory: 32GB (EC2 M2 2XL)

JVM: 24GB

Index size: 11 GB

EDIT2:

A tdouble with precisionStep of 8 means that your doubles will be splitted in sequences of 8 bits. If all your latitudes and longitudes only differ by the last sequence of 8 bits, then tdouble would have the same performance has a normal double on a range query. This is why I suggested to test a precisionStep of 4.

Question: what does this actually mean for a double value?

solr spatial bad performance

Answers (1)

Related Questions