Marc Seeger
Marc Seeger

Reputation: 2737

distinct SOLR field values without count

My question is pretty similar to this question
The difference, I'd need the least RAM intensive way to gather information about the distinct values. I DON'T care for the actual count in this case, I just want to know the possible values for that field.
I'm constantly running out of heap space (30 million+ documents) and there has to be some way/parameter to do this in a memory saving way

Upvotes: 2

Views: 4083

Answers (3)

Risadinha
Risadinha

Reputation: 16671

Use the StatsComponenet to retrieve a list of distinct values for a certain field: https://cwiki.apache.org/confluence/display/solr/The+Stats+Component

Parameter stats.calcdistinct:

If true, distinct values will be calculated and returned as "countDistinct" and "distinctValues" in the response. This calculation may be expensive for some fields, so it is false by default. If you'd only like to return distinct values for specific fields, you can also specify f..stats.calcdistinct, replacing with your field name, to limit the distinct value calculation to the required field.

To keep the load down, retrieve it as few times as possible and cache the results and only retrieve again when the data has changed.

If your index is slow in general you might want to have a look at the cache configuration and/or give SOLR more RAM (if you have the means).

Originally answered here (by me):

https://stackoverflow.com/a/26714447/621690

Upvotes: 1

Pascal Dimassimo
Pascal Dimassimo

Reputation: 6928

If the number of distinct values is high, you will probably need to do facet paging. Use the facet.offset and facet.limit parameters.

Upvotes: 1

Jem
Jem

Reputation: 551

I don't know about RAM usage, but you might wanna try Field collapsing You will find the patch for Solr here.

Upvotes: 0

Related Questions