Vahid
Vahid

Reputation: 407

Qdrant filteration using nested object fields

I have a data structure on Qdrant that in the payload, I have something like this:


{
    "attributes": [
        {
            "attribute_value_id": 22003,
            "id": 1252,
            "key": "Environment",
            "value": "Casual/Daily",
        },
        {
            "attribute_value_id": 98763,
            "id": 1254,
            "key": "Color",
            "value": "Multicolored",
        },
        {
            "attribute_value_id": 22040,
            "id": 1255,
            "key": "Material",
            "value": "Polyester",
        },
    ],
    "brand": {
        "id": 114326,
        "logo": None,
        "slug": "happiness-istanbul-114326",
        "title": "Happiness Istanbul",
    },
}

According to Qdrant documentations, I implemented filtering for brand like this:

filters_list = []
    if param_filters:
        brands = param_filters.get("brand_params")
        if brands:
            filter = models.FieldCondition(
                key="brand.id",
                match=models.MatchAny(any=[int(brand) for brand in brands]),
            )
            filters_list.append(filter)
        search_results = qd_client.search(
            query_filter=models.Filter(must=filters_list),
            collection_name=f"lang{lang}_products",
            query_vector=query_vector,
            search_params=models.SearchParams(hnsw_ef=128, exact=False),
            limit=limit,
        )

Which so far works. But things get complicated when I try to filter on the "attributes" field. As you see, it is a list of dictionaries, containing dictionaries like:

{
    "attribute_value_id": 22040,
    "id": 1255,
    "key": "Material",
    "value": "Polyester",
}

And the attrs filter sent from the front-end is in this structure:

attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
>>> example: {'1237': ['21727', '21759'], '1254': ['52776']}

How can I filter to see if the provided attr_id in the query filter params (here, it is either 1237, or 1254) exists in the attributes field and has one of the attr_value_ids provided in the list (e.g. ['21727', '21759'] here)?

This is what I've tried so far:

if attrs:
            # attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
            print("attrs from search function:", attrs)
            for attr_id, attr_value_ids in attrs.items():
                # Convert attribute value IDs to integers
                attr_value_ids = [
                    int(attr_value_id) for attr_value_id in attr_value_ids
                ]
                # Add a filter for each attribute ID and its values
                filter = models.FieldCondition(
                    key=f"attributes.{attr_id}.attr_value_id",
                    match=models.MatchAny(any=attr_value_ids),
                )
                filters_list.append(filter)

The problem is that key=f"attributes.{attr_id}.attr_value_id", is wrong and I do not know how to achieve this.

UPDATE: Maybe one step closer:

I decided to flatten out the data in the db, to maybe do this better. First, I created a new filed named flattened_attributes, that is as below:

[
  {
    "1237": 21720
  },
  {
    "1254": 52791
  },
  {
    "1255": 22044
  },
]

Also, before filtering, I followed the same approach on the attr filters sent from front-end:

        if attrs:
            # attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
            # we need to flatten attrs to filter on payloads
            flattened_attr = []
            for attr_id, attr_value_ids in attrs.items():
                for attr_value_id in attr_value_ids:
                    flattened_attr.append({attr_id:int(attr_value_id)})

Now, i have two similar list of dicts, and i want to filter those who has at leas one of which is received from front-end (flattened_attr).

There is one type of filtering that we filter if the value of the key exists in a list of values, as mentioned here in the docs. But I do not know how to check if a dict exists in the flattened_attributes field in the db.

Upvotes: 2

Views: 525

Answers (1)

Vahid
Vahid

Reputation: 407

NOTE: The update on the main question was a wrong approach (or just I could not follow it through) and I came up with another approach which solved the problem.

Noting the attributes field's structure in the main question, we see that there is a attribute_value_id key, which maybe different for different attributes (e.g. 1254 for "color" and 1255 for "Material").

So, in the search function, I wrote the following code (I will go through it):

attrs = param_filters.get("attr_params")
if attrs:
    # attrs structure: {"attr_id": [attr_value_ids], "attr_id": [attr_value_ids]}
    # we need to flatten attrs to filter on payloads
    for attr_id, attr_value_ids in attrs.items():
        flattened_attr = []
        for attr_value_id in attr_value_ids:
            flattened_attr.append(int(attr_value_id))

        filter = models.FieldCondition(
            key="attributes[].attribute_value_id",
            match=models.MatchAny(any=flattened_attr),
        )
        filters_list.append(filter)
search_results = qd_client.search(
    query_filter=models.Filter(must=filters_list),
    collection_name=f"lang{lang}_products",
    query_vector=query_vector,
    search_params=models.SearchParams(hnsw_ef=128, exact=False),
    limit=limit,
)

First, for each attr_id I created a separate list containing the attr_value_ids (I had to convert them to int).

Then, using Qdrant documentations (here) I used key="attributes[].attribute_value_id" to go through the list items inside the attributes field, and inside each list item (each is a dictionary) look for the attribute_value_id key, and match it with the values sent.

Also, note that I am creating a separate filter for each attr_id:

for attr_id, attr_value_ids in attrs.items():
    flattened_attr = []
    for attr_value_id in attr_value_ids:
        flattened_attr.append(int(attr_value_id))

    filter = models.FieldCondition(
        key="attributes[].attribute_value_id",
        match=models.MatchAny(any=flattened_attr),
    )
    filters_list.append(filter)

This is because, when multiple values for one attr_id is sent, then at least one of them should be true (OR between attribute_value_id), but when another attr_id is send, this new one and the previous one both should be true (AND between each attr_id). Also, note that I am using must in the main filter conditions, so each filter separately should be True while inside each filter, any of the value_ids are acceptable.

qd_client.search(
    query_filter=models.Filter(must=filters_list),
    collection_name=f"lang{lang}_products",
    query_vector=query_vector,
    search_params=models.SearchParams(hnsw_ef=128, exact=False),
    limit=limit,
)

Upvotes: 1

Related Questions