Reputation: 125
I have some data that I want to insert in a dataframe. The data is columns= ['Title', 'Category']
. For each Titles I have one or more Categories, and I decided to insert the categories as a list. So my df looks like this:
In [39]: title_cat_df
Out[39]:
Title Category
0 Title1 [Cat1, Cat2]
1 Title3 [Cat5]
2 Title2 [Cat3, Cat4]
...
...
...
However I don't know if this is a pythonic/pandaionic(?!) approach, since I have stumbled upon problems such as looking for specific categories using isin
:
In [41]: test_df['Category'].isin(cat_list)
Out[41]: TypeError: unhashable type: 'list'
What would be a better way to represent categories in this case, and hopefully be able to look for titles in a specific category or categories?
Upvotes: 1
Views: 83
Reputation: 862511
Convert column to set
s and use &
for intersection with list converted to set
also:
cat_list = ['Cat1','Cat2', 'Cat4']
print (test_df['Category'].apply(set) & set(cat_list))
0 True
1 False
2 True
Name: Category, dtype: bool
Last filter by boolean indexing
:
test_df = test_df[test_df['Category'].apply(set) & set(cat_list)]
print (test_df)
Title Category
0 Title1 [Cat1, Cat2]
2 Title2 [Cat3, Cat4]
Upvotes: 2