Andy
Andy

Reputation: 1043

Django, haystack, elastic search and one to many relation

I have problem with haystack - I do not know how to search for models A whose all foreign keys meet given condition.

My simplified models look like:

Group:
    id
Meeting:
    group = models.ForeignKey(Group)
    day_of_week = models.IntegerField()
    hour = models.IntegerField()
    length = models.IntegerField()

So basically, a group can have many meetings and users should be able to search for those groups whose all meetings are in given time range. eg:

Group(1)
    Meeting(day_of_week=Monday, hour=9, length=2)
Group(2)
    Meeting(day_of_week=Monday, hour=10, length=1)
    Meeting(day_of_week=Tuesday, hour=8, length=2)
Group(3)
    Meeting(day_of_week=Monday, hour=10, length=1)
    Meeting(day_of_week=Wednesday, hour=12, length=1)

and search: "Monday from 8 to 11", "Tuesday, from 12 to 14 (2p.m.)", "Wednesday, from 6 to 17 (5p.m.)" should return group 1 and 3, because all meetings from those groups contains in user specified ranges and group 2 is not returned, because second meeting is not in given range (tho the first one is).

If I was to write a SQL, I would probably go for something like "select count of matching meetings and count of all meetings if those numbers are equal -> then all meetings meet:

SELECT g.id,
       count(m2.id)
FROM groups g
JOIN meetings m2 ON m2.group_id = g.id
AND ((m2.day_of_week = 0  -- monday
      AND m2.hour >= 8
      AND m2.length<=3)
     OR (m2.day_of_week=1  -- tuesday
         AND m2.hour >= 12
         AND m2.length<=2)
     OR (m2.day_of_week=2 -- wednesday
         AND m2.hour >= 6
         AND m2.length<=11))
GROUP BY g.id
HAVING count(m2.id) =
  (SELECT count(*)
   FROM meetings
   WHERE meetings.group_id=g.id);

But we are using haystack + elastic search for indexing and I have completely no idea how to flatten the model to index and write a query. Can anyone help me with it?

Upvotes: 4

Views: 590

Answers (2)

Andy
Andy

Reputation: 1043

ElasticSearch solution

The key to the solution is ElasticSearch feature named nested objects. Luckily this feature is present in all ES versions. Nested object is key here, because data in meeting are strictly associated.

PUT /myindex
{
  "mappings": {
    "groups": {
      "properties": {
        "meetings": {
          "type": "nested", 
          "properties": {
             "dayOfWeek": { "type": "integer"},
             "start": {"type": "integer"},
             "end": {"type": "integer"}
          }
        },
        "groupId": {"type":"integer"}
      }
    }
  }
}

POST /myindex/groups/_bulk
{"index": {}}
{"groupId": 1, "meetings": [{"dayOfWeek": 0, "start": 9,  "end": 11}]}
{"index": {}}
{"groupId": 2, "meetings": [{"dayOfWeek": 0, "start": 10, "end": 11}, { "dayOfWeek": 1, "start": 8,  "end": 10}]}
{"index": {}}
{"groupId": 3, "meetings": [{"dayOfWeek": 0, "start": 10, "end": 11}, {"dayOfWeek": 2, "start": 12, "end": 13}]}

At this point it is clearly visible that meetings belong to groups and we will search in groups.

It is not possible to directly write query to get all groups whose all nested objects meet condition, but... it can be inversed easily to: get all groups where none of the meeting contains wrong time.

GET /myindex/_search
{
  "query": {
    "bool": {
      "must_not" : {
       "nested": {
          "path": "meetings",
          "filter": {
              "bool": {
                "must_not": {
                  "bool": {
                    "should": [
                        {
                          "bool": {
                            "must": [
                              {"term" : { "dayOfWeek" : 0 }},
                              {"range": {"start": {"from":8, "to":11}}},
                              {"range": {"end": {"from":8, "to":11}}}
                            ]
                          }
                        },
                        {
                          "bool": {
                            "must": [
                              {"term" : { "dayOfWeek" : 1 }},
                              {"range": {"start": {"from":12, "to":14}}},
                              {"range": {"end": {"from":12, "to":14}}}
                            ]
                          }
                        },
                        {
                          "bool": {
                            "must": [
                              {"term" : { "dayOfWeek" : 2 }},
                              {"range": {"start": {"from":6, "to":17}}},
                              {"range": {"end": {"from":6, "to":17}}}
                            ]
                          }
                        }                        
                      ]
                    }
                }
              }
            }
          }
       }
      }
    }
  }

This will return groups 1 and 3. Group 2 will not be returned because one of this meetings overlaps with wrong datetime.

Haystack integration

The second problem was integration with Django Haystack, because by default it doesn't support engine-specific features like nested fields in ES. Luckily I am not the only one who needs it in django app and someone has already resolved it.

Upvotes: 1

Val
Val

Reputation: 217254

You probably need to flatten your documents in such a way that all your documents must be meetings containing group information.

** Solution for ES 5 **

The mapping of your document would be:

PUT /meetings
{
    "mappings": {
       "meeting": {
          "properties": {
             "groupId": {
                "type": "integer"
             },
             "dayOfWeek": {
                "type": "integer"
             },
             "hourRange": {
                "type": "integer_range"
             }
          }
       }
    }
}

Then your five documents would look like this:

POST /meetings/meeting/_bulk
{"index": {}}
{"groupId": 1, "dayOfWeek": 0, "hourRange": {"gte": 9, "lte": 11}}
{"index": {}}
{"groupId": 2, "dayOfWeek": 0, "hourRange": {"gte": 10, "lte": 11}}
{"index": {}}
{"groupId": 2, "dayOfWeek": 1, "hourRange": {"gte": 8, "lte": 10}}
{"index": {}}
{"groupId": 3, "dayOfWeek": 0, "hourRange": {"gte": 10, "lte": 11}}
{"index": {}}
{"groupId": 3, "dayOfWeek": 2, "hourRange": {"gte": 12, "lte": 13}}

Finally, the query would look like this:

POST /meetings/meeting/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "dayOfWeek": 0
                }
              },
              {
                "range": {
                  "hourRange": {
                    "gte": "8",
                    "lte": "11",
                    "relation": "within"
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "dayOfWeek": 1
                }
              },
              {
                "range": {
                  "hourRange": {
                    "gte": "12",
                    "lte": "14",
                    "relation": "within"
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "dayOfWeek": 2
                }
              },
              {
                "range": {
                  "hourRange": {
                    "gte": "6",
                    "lte": "17",
                    "relation": "within"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

** Solution for ES <5 **

PUT /meetings
{
    "mappings": {
       "meeting": {
          "properties": {
             "groupId": {
                "type": "integer"
             },
             "dayOfWeek": {
                "type": "integer"
             },
             "start": {
                "type": "integer"
             },
             "end": {
                "type": "integer"
             }
          }
       }
    }
}

Then your five documents would look like this:

POST /meetings/meeting/_bulk
{"index": {}}
{"groupId": 1, "dayOfWeek": 0, "start": 9,  "end": 11}
{"index": {}}
{"groupId": 2, "dayOfWeek": 0, "start": 10, "end": 11}
{"index": {}}
{"groupId": 2, "dayOfWeek": 1, "start": 8,  "end": 10}
{"index": {}}
{"groupId": 3, "dayOfWeek": 0, "start": 10, "end": 11}
{"index": {}}
{"groupId": 3, "dayOfWeek": 2, "start": 12, "end": 13}

Finally, the query would look like this:

POST /meetings/meeting/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "dayOfWeek": 0
                }
              },
              {
                "range": {
                  "start": {
                    "gte": "8"
                  }
                }
              },
              {
                "range": {
                  "end": {
                    "lte": "11"
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "dayOfWeek": 1
                }
              },
              {
                "range": {
                  "start": {
                    "gte": "12"
                  }
                }
              },
              {
                "range": {
                  "end": {
                    "lte": "14"
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "dayOfWeek": 2
                }
              },
              {
                "range": {
                  "start": {
                    "gte": "6"
                  }
                }
              },
              {
                "range": {
                  "end": {
                    "lte": "17"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Upvotes: 1

Related Questions