Solr CollapsingQParserPlugin with group.facet=on style facet counts

Question

I have a Solr index of about 5 million documents at 8GB using Solr 4.7.0. I require grouping in Solr, but find it to be too slow. Here is the group configuration:

group=on
group.facet=on
group.field=workId
group.ngroups=on

The machine has ample memory at 24GB and 4GB is allocated to Solr itself. Queries are generally taking about 1200ms compared to 90ms when grouping is turned off.

I ran across a plugin called CollapsingQParserPlugin which uses a filter query to remove all but one of a group.

fq={!collapse field=workId}

It's designed for indexes that have a lot of unique groups. I have about 3.8 million. This approach is much much faster at about 120ms. It's a beautiful solution for me except for one thing. Because it filters out other members of the group, only facets from the representative document are counted. For instance, if I have the following three documents:

"docs": [
  {
    "id": "1",
    "workId": "abc",
    "type": "book"
  },
  {
    "id": "2",
    "workId": "abc",
    "type": "ebook"
  },
  {
    "id": "3",
    "workId": "abc",
    "type": "ebook"
  }
]

once collapsed, only the top one shows up in the results. Because the other two get filtered out, the facet counts look like

"type": ["book":1]

instead of

"type": ["book":1, "ebook":1]

Is there a way to get group.facet counts using the collapse filter query?

Charles · Accepted Answer

According to Yonik Seeley, the correct group facet counts can be gathered using the JSON Facet API. His comments can be found at:

https://issues.apache.org/jira/browse/SOLR-7036?focusedCommentId=15601789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15601789

I tested out his method and it works great. I still use the CollapsingQParserPlugin to collapse the results, but I exclude the filter when counting up the facets like so:

fq={!tag=workId}{!collapse field=workId}

json.facet={
  type: {
    type: terms,
    field: type,
    facet: {
      workCount: "unique(workId)"
    },
    domain: {
      excludeTags: [workId]
    }
  }
}

And the result:

{  
  "facets": {  
    "count": 3,
    "type": {  
      "buckets": [  
        {  
          "val": "ebook",
          "count": 2,
          "workCount": 1
        },
        {  
          "val": "book",
          "count": 1,
          "workCount": 1
        }
      ]
    }
  }
}

Solr CollapsingQParserPlugin with group.facet=on style facet counts

Answers (2)

Related Questions