Emil Jessen
Emil Jessen

Reputation: 47

Pandas: Select rows whose dictionary contains a specific value

I have a dataframe, in which one column contain a dictionaries for every row. I want to select rows whose dictionary contains a specific value. Doesn't matter which key contains it.

The dictionaries have many levels (they contain a lot of lists, with a lot of dictionaries, again with a lot of lists and so on). The data could look similar to this, but with the dictionaries being more complex:

df = pd.DataFrame({"A": [1,2,3], "B": [{"a":1}, {"b":**specific_value**}, {"c":3}]})
   A         B
0  1  {'a': 1}
1  2  {'b': 2}
2  3  {'c': 3}

I tried:

df.B.apply(lambda x : 'specific_value' in x.values())

To which I get "false" even the rows that I know contain the 'specific_value'. I am unsure if it is because of the layers.

Upvotes: 1

Views: 1460

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61920

You could use a recursive function to search for the specific value:

import pandas as pd


def nested_find_value(d, needle=4):
    # we assume d is always a list or dictionary
    haystack = d.values() if isinstance(d, dict) else d
    
    for hay in haystack:
        if isinstance(hay, (list, dict)):
            yield from nested_find_value(hay, needle)
        else:
            yield hay == needle


def find(d, needle=4):
    return any(nested_find_value(d, needle))


df = pd.DataFrame({"A": [1, 2, 3], "B": [{"a": 1}, {"b": {"d": 4}}, {"c": 3}]})

result = df["B"].apply(find)
print(result)

Output

0    False
1     True
2    False
Name: B, dtype: bool

In the example above the specific value is 4.

Upvotes: 1

Related Questions