Reputation: 47
I am trying to filter a dataframe using the isin() function by passing in a list and comparing with a dataframe column that also contains lists. This is an extension of the question below:
How to implement 'in' and 'not in' for Pandas dataframe
For example, instead of having one country in each row, now each row contains a list of countries.
df = pd.DataFrame({'countries':[['US', 'UK'], ['UK'], ['Germany', 'France'], ['China']]})
And to filter, I set two separate lists:
countries = ['UK','US']
countries_2 = ['UK']
The intended results should be the same because both rows 0 and 1 contain UK and/or US
>>> df[df.countries.isin(countries)]
countries
0 US, UK
1 UK
>>> df[~df.countries.isin(countries_2)]
countries
0 US, UK
1 UK
However Python threw the following error
TypeError: unhashable type: 'list'
Upvotes: 1
Views: 1465
Reputation: 862611
One possible solutions with sets and issubset
or isdisjoint
with map
:
print (df[df.countries.map(set(countries).issubset)])
countries
0 [US, UK]
print (df[~df.countries.map(set(countries).isdisjoint)])
countries
0 [US, UK]
1 [UK]
print (df[df.countries.map(set(countries_2).issubset)])
countries
0 [US, UK]
1 [UK]
print (df[~df.countries.map(set(countries_2).isdisjoint)])
countries
0 [US, UK]
1 [UK]
Upvotes: 1