Daryl
Daryl

Reputation: 221

Best way to store hiearchical data of arbitrary length in elasticsearch

I am trying to store hierarchical data for geographical regions in elasticsearch for the purpose of fuzzy matching.

Example

USA->California->Santa Clara County->Palo Alto->Palo Alto St.

Currently I am storing everything in a flat structure, with the name and path, like this, for example:

{
    "name" : "Palo Alto St.",
    "path" : [
        "USA",
        "California",
        "Santa Clara County",
        "Palo Alto"
    ]
}

"Palo Alto" looks similar

{
    "name" : "Palo Alto",
    "path" : [
        "USA",
        "California",
        "Santa Clara County"
    ]
}

I then run a fuzzy_like_this query like this:

{
      "query": {
      "fuzzy_like_this": {
      "like_text": "palo alto",
      "fields": [
        "name",
        "path"
      ],
      "min_similarity": 0.9,
      "prefix_length": 2
    }
  }
}

Unfortunately, this doesn't seem to work too well when I want, for example, to boost the score of the result that is the least deep in the tree.

Is there some standard way to do this, such as parent/child relationships? I did a bit of digging but there is no mention of hierarchical data of arbitrary length.

I did a bit of playing around with the custom score query, but it doesn't seem flexible enough for this purpose, or maybe my understanding is a bit too superficial.

Thanks for reading

Upvotes: 2

Views: 1071

Answers (1)

J.T.
J.T.

Reputation: 2616

I will give this one a stab! I'm going to suggest you make use of nested documents and the custom_filters_score query (replaced by the function score in .90.4. I have yet to investigate that one entirely.

First, lets just put the "name" field in the hierarchy path just for fun. You could always boost the name field match 1.5 or 2x and leave fields as is, but this will give us more clarity. I'm also going to require that you change your data format to look like this.

{
    "name": "Palo Alto",
    "path": [
        {
            "name": "USA",
            "level": 1
        },
        {
            "name": "California",
            "level": 2
        },
        {
            "name": "Santa Clara County",
            "level": 3
        },
        {
            "name": "Palto Alto",
            "level": 4
        }
    ]
}

Now, I'm going to change you over to a nested document for your field mapping.

{
    "my_document": {
        "properties": {
            "path": {
                "type": "nested",
                "properties": {
                    "name": {
                        "type": "string"
                    },
                    "level": {
                        "type": "integer"
                    }
                }
            }
        }
    }
}

Now we are going to do a query, this is going to seem a bit crazy. And, to be honest, there are some alternatives with a bit of customization that will get you there. Also, I could be overlooking something obvious, but I'd love to hear it. Hey, it's late!

{
    "query": {
        "custom_filters_score": {
            "query": {
                "constant_score": {
                    "query": {
                        "nested": {
                            "path": "path",
                            "query": {
                                "fuzzy_like_this_field": {
                                    "path.name": {
                                        "like_text": "palo alto",
                                        "min_similarity": 0.9,
                                        "prefix_length": 2
                                    }
                                }
                            }
                        }
                    },
                    "boost": 1
                }
            },
            "filters": [
                {
                    "filter": {
                        "nested": {
                            "path": "path",
                            "query": {
                                "bool": {
                                    "must": [
                                        {
                                            "fuzzy_like_this_field": {
                                                "name": {
                                                    "like_text": "palo alto",
                                                    "min_similarity": 0.9,
                                                    "prefix_length": 2
                                                }
                                            }
                                        },
                                        {
                                            "term": {
                                                "level": 2
                                            }
                                        }
                                    ]
                                }
                            }
                        }
                    },
                    "boost": 1
                },
                {
                    "filter": {
                        "nested": {
                            "path": "path",
                            "query": {
                                "bool": {
                                    "must": [
                                        {
                                            "fuzzy_like_this_field": {
                                                "name": {
                                                    "like_text": "palo alto",
                                                    "min_similarity": 0.9,
                                                    "prefix_length": 2
                                                }
                                            }
                                        },
                                        {
                                            "term": {
                                                "level": 3
                                            }
                                        }
                                    ]
                                }
                            }
                        }
                    },
                    "boost": 2
                },
                {
                    "filter": {
                        "nested": {
                            "path": "path",
                            "query": {
                                "bool": {
                                    "must": [
                                        {
                                            "fuzzy_like_this_field": {
                                                "name": {
                                                    "like_text": "palo alto",
                                                    "min_similarity": 0.9,
                                                    "prefix_length": 2
                                                }
                                            }
                                        },
                                        {
                                            "term": {
                                                "level": 4
                                            }
                                        }
                                    ]
                                }
                            }
                        }
                    },
                    "boost": 3
                },
                {
                    "filter": {
                        "nested": {
                            "path": "path",
                            "query": {
                                "bool": {
                                    "must": [
                                        {
                                            "fuzzy_like_this_field": {
                                                "name": {
                                                    "like_text": "palo alto",
                                                    "min_similarity": 0.9,
                                                    "prefix_length": 2
                                                }
                                            }
                                        },
                                        {
                                            "term": {
                                                "level": 5
                                            }
                                        }
                                    ]
                                }
                            }
                        }
                    },
                    "boost": 4
                },
                {
                    "filter": {
                        "nested": {
                            "path": "path",
                            "query": {
                                "bool": {
                                    "must": [
                                        {
                                            "fuzzy_like_this_field": {
                                                "name": {
                                                    "like_text": "palo alto",
                                                    "min_similarity": 0.9,
                                                    "prefix_length": 2
                                                }
                                            }
                                        },
                                        {
                                            "term": {
                                                "level": 6
                                            }
                                        }
                                    ]
                                }
                            }
                        }
                    },
                    "boost": 5
                }
            ],
             "score_mode" : "max"
        }
    }
}

Upvotes: 1

Related Questions