Dron4K
Dron4K

Reputation: 474

Workaround to group solr search result by collection field

I am beginner at Solr. There can be mistakes. Sorry. I'm using version 7.7.1. Let's say there are next documents:

{
  "documents": [
    {
      "id": 1,
      "category": [
        "a",
        "b"
      ],
      "score":0.10 //lucene score
    },
    {
      "id": 2,
      "category": [
        "b",
        "c",
        "d",
        "e"
      ],
      "score":0.20 //lucene score
    },
    {
      "id": 3,
      "category": [
        "a",
        "e"
      ],
      "score":0.30 //lucene score
    },
    {
      "id": 4,
      "category": [
        "d",
        "e"
      ],
      "score":0.40 //lucene score
    },
    {
      "id": 5,
      "category": [
        "a",
        "c"
      ],
      "score":0.50 //lucene score
    }
  ]
}

The main task is the next. I get 3 or more different categories and I need for only one document with the highest score for every category. In other words I need to group result by category field, every group has to be sorted by score desc and every group has to be limited with 1.

For example for got a,b,c categories result has to contain 3 documents

document with id == 5 for a category
document with id == 2 for b category
document with id == 5 for c category

Is it possible to create solr query to get such result with single request?

I tried the next approaches but they didn't help or work bad:

  1. Grouping is not considered due to category field is collection.

  2. Faceting returns only number of results. I need for a complete document.

  3. There is a possibility to do the request for every category. But there can be 50 categories at one time and I guess it will be time consuming to make 50 requests in solr.

Thanks and Regards

Upvotes: 0

Views: 526

Answers (1)

Sanjay Dutt
Sanjay Dutt

Reputation: 2222

json.facet can be helpful here. You can use below query and it will give you the response as per your needs. This query will first create buckets for category field sorted in ascending order and then nested bucket of ids sorted with lucene score of the query that passed via "qq" param.

I have used a function query which we are using to evaluate lucene score for the given query. For now, It is a simple query, but you can create a complex too and passed in qq parameter. Read here :- https://lucene.apache.org/solr/guide/6_6/function-queries.html ot get more information on function queries.

q=*&qq=category:a&json.facet={
categories:{
    type:terms,
    field:category,
    sort:{index:asc},
    facet:{
        id:{
            type:terms,
            field:id,
            sort:"query_score desc",
            facet:{
                query_score:"min(if(exists(query($qq)),query($qq),0))"
            }
        }
    }
}

}

[Response of above query]

{
  "responseHeader":{
    "status":0,
    "QTime":7,
    "params":{
      "qq":"category:a",
      "q":"*",
      "json.facet":"{ categories:{ type:terms, field:category, sort:{index:asc}, facet:{ id:{ type:terms, field:id, sort:\"query_score desc\", facet:{ query_score:\"min(if(exists(query($qq)),query($qq),0))\" } } } } }",
      "indent":"on",
      "fl":"*,query($qq,-1)",
      "rows":"0",
      "wt":"json"}},
  "response":{"numFound":5,"start":0,"docs":[]
  },
  "facets":{
    "count":5,
    "categories":{
      "buckets":[{
          "val":"a",
          "count":3,
          "id":{
            "buckets":[{
                "val":"1",
                "count":1,
                "query_score":0.5389965176582336},
              {
                "val":"3",
                "count":1,
                "query_score":0.5389965176582336},
              {
                "val":"5",
                "count":1,
                "query_score":0.5389965176582336}]}},
        {
          "val":"b",
          "count":2,
          "id":{
            "buckets":[{
                "val":"1",
                "count":1,
                "query_score":0.5389965176582336},
              {
                "val":"2",
                "count":1,
                "query_score":0.0}]}},
        {
          "val":"c",
          "count":2,
          "id":{
            "buckets":[{
                "val":"5",
                "count":1,
                "query_score":0.5389965176582336},
              {
                "val":"2",
                "count":1,
                "query_score":0.0}]}},
        {
          "val":"d",
          "count":2,
          "id":{
            "buckets":[{
                "val":"2",
                "count":1,
                "query_score":0.0},
              {
                "val":"4",
                "count":1,
                "query_score":0.0}]}},
        {
          "val":"e",
          "count":3,
          "id":{
            "buckets":[{
                "val":"3",
                "count":1,
                "query_score":0.5389965176582336},
              {
                "val":"2",
                "count":1,
                "query_score":0.0},
              {
                "val":"4",
                "count":1,
                "query_score":0.0}]}}]}}}

For more information on json.facet :- https://lucene.apache.org/solr/guide/7_2/json-facet-api.html

Upvotes: 1

Related Questions