Reputation: 375
I have a Solr index of about 5 million documents at 8GB using Solr 4.7.0. I require grouping in Solr, but find it to be too slow. Here is the group configuration:
group=on
group.facet=on
group.field=workId
group.ngroups=on
The machine has ample memory at 24GB and 4GB is allocated to Solr itself. Queries are generally taking about 1200ms compared to 90ms when grouping is turned off.
I ran across a plugin called CollapsingQParserPlugin which uses a filter query to remove all but one of a group.
fq={!collapse field=workId}
It's designed for indexes that have a lot of unique groups. I have about 3.8 million. This approach is much much faster at about 120ms. It's a beautiful solution for me except for one thing. Because it filters out other members of the group, only facets from the representative document are counted. For instance, if I have the following three documents:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book"
},
{
"id": "2",
"workId": "abc",
"type": "ebook"
},
{
"id": "3",
"workId": "abc",
"type": "ebook"
}
]
once collapsed, only the top one shows up in the results. Because the other two get filtered out, the facet counts look like
"type": ["book":1]
instead of
"type": ["book":1, "ebook":1]
Is there a way to get group.facet counts using the collapse filter query?
Upvotes: 4
Views: 1457
Reputation: 375
According to Yonik Seeley, the correct group facet counts can be gathered using the JSON Facet API. His comments can be found at:
I tested out his method and it works great. I still use the CollapsingQParserPlugin to collapse the results, but I exclude the filter when counting up the facets like so:
fq={!tag=workId}{!collapse field=workId}
json.facet={
type: {
type: terms,
field: type,
facet: {
workCount: "unique(workId)"
},
domain: {
excludeTags: [workId]
}
}
}
And the result:
{
"facets": {
"count": 3,
"type": {
"buckets": [
{
"val": "ebook",
"count": 2,
"workCount": 1
},
{
"val": "book",
"count": 1,
"workCount": 1
}
]
}
}
}
Upvotes: 3
Reputation: 375
I was unable to find a way to do this with Solr or plugin configurations, so I developed a work around to effectively create group facet counts while still using the CollapsingQParserPlugin.
I do this by making a duplicate of the fields I'll be faceting on and making sure all facet values for the entire group are in each document like so:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book",
"facetType": [
"book",
"ebook"
]
},
{
"id": "2",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
},
{
"id": "3",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
}
]
When I ask Solr to generate facet counts, I use the new field:
facet.field=facetType
This ensures that all facet values are accounted for and that the counts represent groups. But when I use a filter query, I revert back to using the old field:
fq=type:book
This way the correct document is chosen to represent the group.
I know this is a dirty, complex way to make it work, but it does work and that's what I needed. Also it requires the ability to query your documents before insertion into Solr, which calls for some development. If anyone has a simpler solution I would still love to hear it.
Upvotes: 2