filtering items from list in df column

Question

I'm trying to figure out how to create a new column with values from another column where the column type is a list. I'm struggeling to select only specific values from the said list and not the entire list if the value meets the condition.

More specific I want to create a new column with the values that contain 'critic'. So the result should look like this:

ID        LIST_COLUMN                                   NEW COLUMN 
id1       ['moviecritic', 'movie', 'disney']            moviecritic
id2       ['musiccritic', 'Metallica', 'rock']          musiccritic
id3       ['bookcritic', 'young adult', 'literature']   bookcritic

I've tried following, but it returns a ValueError (Length of values (0) does not match length of index (3))

import pandas as pd
data = [['id1', ['moviecritic', 'movie', 'disney']],['id2',['musiccritic', 'Metallica', 'rock']],
        ['id3',['bookcritic', 'young adult', 'literature']]]
df = pd.DataFrame(data,columns=['ID','LIST_COLUMN'])
        
        
df['NEW_COLUMN'] = [x for x in row.df['LIST_COLUMN'] if 'critic' in x]

ALollz · Accepted Answer

You can explode the list then filter to the values you want and assign back (it will align on Index, which explode preserves). Since you will never have more than 1 value with 'critic' in it this will work.

s = df['LIST_COLUMN'].explode()
df['new_column'] = s[s.str.contains('critic')]

    ID                            LIST_COLUMN   new_column
0  id1           [moviecritic, movie, disney]  moviecritic
1  id2         [musiccritic, Metallica, rock]  musiccritic
2  id3  [bookcritic, young adult, literature]   bookcritic

If possibly multiple matches, you will need to somehow aggregate along the original Index, for instance you can str.join:

df['new_column'] = s[s.str.`contains('critic')].groupby(level=0).agg(','.join)

filtering items from list in df column

Answers (2)

Related Questions