szab.kel
szab.kel

Reputation: 2536

Solr facet equivalent of group by?

If I have some data like this:

{"field1":"x", "field2":".."}
{"field1":"x", "field2":".."}
{"field1":"y", "field2":".."}
{"field1":"y", "field2":".."}
{"field1":"y", "field2":".."}

Using a simple group=true&group.field=field1&group.limit=0 I get results like this:

{
  "responseHeader":{..}
  "grouped":{
        "field1": {
            "matches": 5,
            "groups": [

                {"groupValue": "x", "doclist":{"numFound": 2, ...}}
                {"groupValue": "y", "doclist":{"numFound": 3, ...}}

            ]
        }

  }
}

Using this, I know the num of documents found for each groupValue (numFound). The problem is I need to sort the resulting groups in descending order, which is not possible with either sort (a simple sort=numFound would result in an exception, saying the field numFound does not exists and the group.sort would sort the documents inside each group).

Is there an equivalent of this using facets where I can sort the results by count?

Upvotes: 0

Views: 1747

Answers (1)

lzagkaretos
lzagkaretos

Reputation: 2910

You can try:

http://localhost:8983/solr/your_core/select?facet.field=field1&facet.sort=count&facet.limit=-1&facet=on&indent=on&q=*:*&rows=0&start=0&wt=json

The result will be something like:

{
  "responseHeader":{
    "status":0,
    "QTime":17,
    "params":{
      "q":"*:*",
      "facet.field":"field1",
      "indent":"on",
      "start":"0",
      "rows":"0",
      "facet":"on",
      "wt":"json"}},
  "response":{"numFound":225364,"start":0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "field1":[
        "x",113550,
        "y",111814]},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_heatmaps":{}
  }
}

Just tested with Solr 6.3.0.

For more information you can check related part in the Solr documentation.

If you want to compute simultaneously the number of available facets, you can use Solr stats Component (as the field is of type numeric, string, or date).
Have in mind though, server performance and memory overhead issues might appear.

Running a query like:

http://localhost:8983/solr/your_core/select?facet.field=field1&facet.sort=count&facet.limit=10&facet=true&indent=on&q=*:*&rows=0&start=0&wt=json&stats=true&stats.field={!cardinality=true}field1

The response is something like:

{
  "responseHeader":{
    "status":0,
    "QTime":614,
    "params":{
      "facet.limit":"10",
      "q":"*:*",
      "facet.field":"field1",
      "indent":"on",
      "stats":"true",
      "start":"0",
      "rows":"0",
      "facet":"true",
      "wt":"json",
      "facet.sort":"count",
      "stats.field":"{!cardinality=true}field1"}},
  "response":{"numFound":2336315,"start":0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "field1":[
        "Value1",708116,
        "Value2",607088,
        "Value3",493949,
        "Value4",314433,
        "Value5",104478,
        "Value6",41099,
        "Value7",28879,
        "Value8",18767,
        "Value9",9308,
        "Value10",4545]},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_heatmaps":{}},
  "stats":{
    "stats_fields":{
      "field1":{
        "cardinality":27}}}}

For more information about stats you can check here.

Upvotes: 1

Related Questions