Markos di Mitsas
Markos di Mitsas

Reputation: 125

Using a list as a value in a pandas dataframe

I have some data that I want to insert in a dataframe. The data is columns= ['Title', 'Category']. For each Titles I have one or more Categories, and I decided to insert the categories as a list. So my df looks like this:

In [39]: title_cat_df
Out[39]: 
    Title      Category
0  Title1  [Cat1, Cat2]
1  Title3        [Cat5]
2  Title2  [Cat3, Cat4]
...
...
...

However I don't know if this is a pythonic/pandaionic(?!) approach, since I have stumbled upon problems such as looking for specific categories using isin:

In [41]: test_df['Category'].isin(cat_list)
Out[41]: TypeError: unhashable type: 'list'

What would be a better way to represent categories in this case, and hopefully be able to look for titles in a specific category or categories?

Upvotes: 1

Views: 83

Answers (1)

jezrael
jezrael

Reputation: 862511

Convert column to sets and use & for intersection with list converted to set also:

cat_list = ['Cat1','Cat2', 'Cat4']
print (test_df['Category'].apply(set) & set(cat_list))
0     True
1    False
2     True
Name: Category, dtype: bool

Last filter by boolean indexing:

test_df = test_df[test_df['Category'].apply(set) & set(cat_list)]
print (test_df)
    Title      Category
0  Title1  [Cat1, Cat2]
2  Title2  [Cat3, Cat4]

Upvotes: 2

Related Questions