Reputation: 1858
I have the following pandas DataFrame. There are two columns A
and B
composed of lists of mutltiple tuples.
import pandas as pd
dictionary_input = {'A' : [5,6,3,4],
'B' : [[('AA', 4, 11), ('ABC', 28, 99), ('ABC', 23, 86)], [('AA', 2, 10)], [('ABC', 56, 76), ('BB', 15, 183)], [('BB', 15, 183)]],
'C': [[('XYZ', 7, 9), ('XX',24, 33), ('BB', 179, 184)], [('XX',72, 75)], [('ABC',25, 45)], [('BB',91, 187)]]}
df = pd.DataFrame(dictionary_input)
print(df)
which results in:
A B C
0 5 [(AA, 4, 11), (ABC, 28, 99), (ABC, 23, 86)] [(XYZ, 7, 9), (XX, 24, 33), (BB, 179, 184)]
1 6 [(AA, 2, 10)] [(XX, 72, 75)]
2 3 [(ABC, 56, 76), (BB, 15, 183)] [(ABC, 25, 45)]
3 4 [(BB, 15, 183)] [(BB, 91, 187)]
My problem is that I would like to subset this DataFrame based on the values in the lists of tuples, i.e. based on a single tuple.
If I were to subset the dataframe based on B
has tuple (BB, 15, 183)
, then the following would be the output:
A B C
2 3 [(ABC, 56, 76), (BB, 15, 183)] [(ABC, 25, 45)]
3 4 [(BB, 15, 183)] [(BB, 91, 187)]
I tried to accomplish this using
df[df.B.isin(('BB', 15, 183))]
But this is wrong, as it gives me an empty DataFrame.
How do I subset based on values inside a list in pandas DataFrame, if the values are tuples?
Upvotes: 1
Views: 475
Reputation: 150735
If you are working with pandas 0.25+, you can make use of explode
, which make a series out of the list in each cell and concatenate them. similar to pd.concat(pd.Series(x) for x in df['B'])
, but keeps the original index. Then you can compare that series to your triple and groupby
:
s = df['B'].explode()
df[(s == ('BB', 15, 183)).groupby(level=0).any()]
Output:
A B C
2 3 [(ABC, 56, 76), (BB, 15, 183)] [(ABC, 25, 45)]
3 4 [(BB, 15, 183)] [(BB, 91, 187)]
Output (s
):
0 (AA, 4, 11)
0 (ABC, 28, 99)
0 (ABC, 23, 86)
1 (AA, 2, 10)
2 (ABC, 56, 76)
2 (BB, 15, 183)
3 (BB, 15, 183)
Name: B, dtype: object
Upvotes: 2
Reputation: 91
You can do this by apply method:
df[df['B'].apply(lambda x: ('BB', 15, 183) in x)]
Upvotes: 1