gib
gib

Reputation: 788

Grouping results in SOLR?

I have a Solr index with a schema that looks like this:

{
  "responseHeader": {
    "status": 0,
    "QTime": 0,
    "params": {
      "q": "*:*",
      "q.op": "OR",
      "_": "1673422604341"
    }
  },
  "response": {
    "numFound": 1206,
    "start": 0,
    "numFoundExact": true,
    "docs": [
      {
        "material_name_s":"MaterialName1",
        "company_name_s": "CompanyName1",
        "price_per_lb_value_f": 1.11,
        "received_date_dt": "2015-01-01T00:00:00Z"
      },
      {
        "material_name_s":"MaterialName1",
        "company_name_s": "CompanyName2",
        "price_per_lb_value_f": 2.22,
        "received_date_dt": "2020-01-01T00:00:00Z"
      },
      {
        "material_name_s":"MaterialName1",
        "company_name_s": "CompanyName3",
        "price_per_lb_value_f": 3.33,
        "received_date_dt": "2021-01-01T00:00:00Z"
      },
      {
        "material_name_s":"MaterialName2",
        "company_name_s": "CompanyName1",
        "price_per_lb_value_f": 4.44,
        "received_date_dt": "2016-01-01T00:00:00Z"
      },
      {
        "material_name_s":"MaterialName2",
        "company_name_s": "CompanyName2",
        "price_per_lb_value_f": 5.55,
        "received_date_dt": "2021-01-01T00:00:00Z"
      },
      {
        "material_name_s":"MaterialName2",
        "company_name_s": "CompanyName3",
        "price_per_lb_value_f": 6.66,
        "received_date_dt": "2022-01-01T00:00:00Z"
      }
    ]
  }
}

These are historical prices for different materials from different companies.

I would like to get the lowest price_per_lb_value_f for each material_name_s in last 2 years, so the results would look like this:

{
  "response": {
    "numFound": 2,
    "start": 0,
    "numFoundExact": true,
    "docs": [
      {
        "material_name_s":"MaterialName1",
        "company_name_s": "CompanyName3",
        "price_per_lb_value_f": 3.33,
        "received_date_dt": "2021-01-01T00:00:00Z"
      },
      {
        "material_name_s":"MaterialName2",
        "company_name_s": "CompanyName2",
        "price_per_lb_value_f": 5.55,
        "received_date_dt": "2021-01-01T00:00:00Z"
      }
    ]
  }
}

Is this kind of grouping is even possible to do with Solr? I'm a newbie to Solr, so any help would be appreciated.

Upvotes: 0

Views: 138

Answers (1)

Seasers
Seasers

Reputation: 546

grouping is possible in Solr. You can get the result you want with the following queries:

  1. Field collapsing approach (recommended in your case): https://solr.apache.org/guide/solr/latest/query-guide/collapse-and-expand-results.html
http://localhost:8983/solr/test/select?indent=true&q.op=OR&q=received_date_dt:[NOW-3YEAR%20TO%20*]&fq={!collapse%20field=material_name_s%20min=price_per_lb_value_f}

q:received_date_dt:[NOW-3YEAR TO *] // Range query to filter only the documents received in the last 3 years otherwise I wouldn't get documents received on 2021-01-01
fq:{!collapse field=material_name_s min=price_per_lb_value_f} // It shows only one document within all documents with the same value of material_name_s. It gets the document with the min price_per_lb_value_f

  1. Grouping approach: https://solr.apache.org/guide/solr/latest/query-guide/result-grouping.html
http://localhost:8983/solr/test/select?indent=true&q.op=OR&q=received_date_dt:[NOW-3YEAR%20TO%20*]&group=true&group.field=material_name_s&group.sort=price_per_lb_value_f%20asc

q:received_date_dt:[NOW-3YEAR TO *] // same filter as before
group:true // enable grouping
group.field:material_name_s // groups by material_name_s
group.sort:price_per_lb_value_f asc // sort each group by the field price_per_lb_value_f in ascending order
group.limit not specified as the default value is 1 // it sets the number of results for each group

Upvotes: 1

Related Questions