Solr Error in Group Query

Question

I need to use solr for adress search. I have to put country and state for city search, but here i have a problem with the data.

With this query i obtain all the citys grouped:

Query: country:"SPAIN" AND state:"MURCIA" 
Field: city
Params: group=true&group.field=ciudadElectoral&group.format=simple

And i obtain this:

{
  "responseHeader": {
    "status": 0,
    "QTime": 3,
    "params": {
      "group.format": "simple",
      "fl": "city",
      "indent": "true",
      "q": "country:\"SPAIN\" AND state:\"MURCIA\"",
      "_": "1493920188445",
      "group.field": "city",
      "group": "true",
      "wt": "json"
    }
  },
  "grouped": {
    "city": {
      "matches": 80,
      "doclist": {
        "numFound": 80,
        "start": 0,
        "docs": [
          {
            "city": "CIUDAD DE BUENOS AIRES"
          },
          {
            "city": "CIUDAD DE BUENOS AIRES"
          },
          {
            "city": "VILLA MARTINEZ"
          },
          {
            "city": "PALERMO"
          }
        ]
      }
    }
  }
}

"CIUDAD DE BUENOS AIRES" is duplicated, and another city "VILLA ALBOROTO" is missing.

If I put off the parameter "group.format=simple", i obtain this output:

    {
  "responseHeader": {
    "status": 0,
    "QTime": 2,
    "params": {
      "fl": "city",
      "indent": "true",
      "q": "country:\"SPAINT\" AND state:\"MURCIA\"",
      "_": "1493920434726",
      "group.field": "city",
      "group": "true",
      "wt": "json"
    }
  },
  "grouped": {
    "city": {
      "matches": 80,
      "groups": [
        {
          "groupValue": "de",
          "doclist": {
            "numFound": 5,
            "start": 0,
            "docs": [
              {
                "city": "CIUDAD DE BUENOS AIRES"
              }
            ]
          }
        },
        {
          "groupValue": "buen",
          "doclist": {
            "numFound": 3,
            "start": 0,
            "docs": [
              {
                "city": "CIUDAD DE BUENOS AIRES"
              }
            ]
          }
        },
        {
          "groupValue": "vill",
          "doclist": {
            "numFound": 2,
            "start": 0,
            "docs": [
              {
                "city": "VILLA MARTINEZ"
              }
            ]
          }
        },
        {
          "groupValue": "palerm",
          "doclist": {
            "numFound": 70,
            "start": 0,
            "docs": [
              {
                "city": "PALERMO"
              }
            ]
          }
        }
      ]
    }
  }
}

I can see that the "groupValue" has a strange value instead off the complete value of the field. I think that is the problem.

My solar version is 4.10, Anyone knows how to do this query correctly? Thanks.

MatsLindh · Accepted Answer

If you're going to group by a field, that field should be a string field (or a field with a KeywordTokenizer with nothing more than a lowercase filter). What you're seeing are grouping performed on the processed tokens (which is what Solr has in its index behind the scenes). Using a string field or a KeywordTokenizer w/lowercasing will avoid splitting and stemming these fields.

You can see that "PALERMO" has been processed to "palerm", while "CIUDAD DE BUENOS AIRES" has been split into multiple tokens, among them "de" and "buen". These values are then used for the group operation, giving you a different result than expected.

Solr Error in Group Query

Answers (1)

Related Questions