Reputation: 1318
I need to use solr for adress search. I have to put country and state for city search, but here i have a problem with the data.
With this query i obtain all the citys grouped:
Query: country:"SPAIN" AND state:"MURCIA"
Field: city
Params: group=true&group.field=ciudadElectoral&group.format=simple
And i obtain this:
{
"responseHeader": {
"status": 0,
"QTime": 3,
"params": {
"group.format": "simple",
"fl": "city",
"indent": "true",
"q": "country:\"SPAIN\" AND state:\"MURCIA\"",
"_": "1493920188445",
"group.field": "city",
"group": "true",
"wt": "json"
}
},
"grouped": {
"city": {
"matches": 80,
"doclist": {
"numFound": 80,
"start": 0,
"docs": [
{
"city": "CIUDAD DE BUENOS AIRES"
},
{
"city": "CIUDAD DE BUENOS AIRES"
},
{
"city": "VILLA MARTINEZ"
},
{
"city": "PALERMO"
}
]
}
}
}
}
"CIUDAD DE BUENOS AIRES" is duplicated, and another city "VILLA ALBOROTO" is missing.
If I put off the parameter "group.format=simple", i obtain this output:
{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"fl": "city",
"indent": "true",
"q": "country:\"SPAINT\" AND state:\"MURCIA\"",
"_": "1493920434726",
"group.field": "city",
"group": "true",
"wt": "json"
}
},
"grouped": {
"city": {
"matches": 80,
"groups": [
{
"groupValue": "de",
"doclist": {
"numFound": 5,
"start": 0,
"docs": [
{
"city": "CIUDAD DE BUENOS AIRES"
}
]
}
},
{
"groupValue": "buen",
"doclist": {
"numFound": 3,
"start": 0,
"docs": [
{
"city": "CIUDAD DE BUENOS AIRES"
}
]
}
},
{
"groupValue": "vill",
"doclist": {
"numFound": 2,
"start": 0,
"docs": [
{
"city": "VILLA MARTINEZ"
}
]
}
},
{
"groupValue": "palerm",
"doclist": {
"numFound": 70,
"start": 0,
"docs": [
{
"city": "PALERMO"
}
]
}
}
]
}
}
}
I can see that the "groupValue" has a strange value instead off the complete value of the field. I think that is the problem.
My solar version is 4.10, Anyone knows how to do this query correctly? Thanks.
Upvotes: 0
Views: 230
Reputation: 52832
If you're going to group by a field, that field should be a string
field (or a field with a KeywordTokenizer
with nothing more than a lowercase filter). What you're seeing are grouping performed on the processed tokens (which is what Solr has in its index behind the scenes). Using a string field or a KeywordTokenizer w/lowercasing will avoid splitting and stemming these fields.
You can see that "PALERMO" has been processed to "palerm", while "CIUDAD DE BUENOS AIRES" has been split into multiple tokens, among them "de" and "buen". These values are then used for the group operation, giving you a different result than expected.
Upvotes: 1