Reputation: 530
I'm using the Solarium PHP library to connect to a SOLR instance. I have an index with around 3.5 mio documents. Searching and filtering works great, but I have one thing that just doesn't seem to work quite well with SOLR.
The documents describe companies. Now I want to know how many unique phonenumbers are in the index given a specific query. Some companies are related and share the phonenumber, some don't have a phonenumber at all.
Facets are not really an option since they are limited to 100 results per request. For 3.5 mio documents that would mean a lot of requests. I tried to use the getStats()
option, but that was slow too. I finally resided to GroupComponent queries, which seem to do the job.
Still if there are a lot of results (100k+) in the resultset, it is loading for a very long time and eventually crashing SOLR. I increased the memory limits to prevent the crashes, but it is still not loading within decent time constraints. This is my code:
$groupComponent = $select->getGrouping();
$groupComponent->addField('phone');
$groupComponent->setNumberOfGroups(true);
$groupComponent->setLimit(0);
$groupComponent->setTruncate(true);
$groupComponent->setFormat('simple');
$groupComponent->setFacet(true);
$resultset = $this->client->execute($select);
$groups = $resultset->getGrouping();
I actually only need the counts, not the results. I set the limit to 0, but I'm not sure if that stands for zero or unlimited in this case. If I put it to 1 it doesn't make any difference. So I'm not sure if it is possible to just get the counts. I have also tried to add $groupComponent->setMainresult(true);
but that doesn't make it faster and seems to return 0 all the time for the number of phonenumbers.
If anybody has a suggestion for speeding up the process in Solarium or directly in SOLR I love to hear it. Thanks!
Upvotes: 0
Views: 131