Reputation: 47
I have a dataframe, in which one column contain a dictionaries for every row. I want to select rows whose dictionary contains a specific value. Doesn't matter which key contains it.
The dictionaries have many levels (they contain a lot of lists, with a lot of dictionaries, again with a lot of lists and so on). The data could look similar to this, but with the dictionaries being more complex:
df = pd.DataFrame({"A": [1,2,3], "B": [{"a":1}, {"b":**specific_value**}, {"c":3}]})
A B
0 1 {'a': 1}
1 2 {'b': 2}
2 3 {'c': 3}
I tried:
df.B.apply(lambda x : 'specific_value' in x.values())
To which I get "false" even the rows that I know contain the 'specific_value'. I am unsure if it is because of the layers.
Upvotes: 1
Views: 1460
Reputation: 61920
You could use a recursive function to search for the specific value:
import pandas as pd
def nested_find_value(d, needle=4):
# we assume d is always a list or dictionary
haystack = d.values() if isinstance(d, dict) else d
for hay in haystack:
if isinstance(hay, (list, dict)):
yield from nested_find_value(hay, needle)
else:
yield hay == needle
def find(d, needle=4):
return any(nested_find_value(d, needle))
df = pd.DataFrame({"A": [1, 2, 3], "B": [{"a": 1}, {"b": {"d": 4}}, {"c": 3}]})
result = df["B"].apply(find)
print(result)
Output
0 False
1 True
2 False
Name: B, dtype: bool
In the example above the specific value is 4
.
Upvotes: 1