Octavarium
Octavarium

Reputation: 131

filtering items from list in df column

I'm trying to figure out how to create a new column with values from another column where the column type is a list. I'm struggeling to select only specific values from the said list and not the entire list if the value meets the condition.

More specific I want to create a new column with the values that contain 'critic'. So the result should look like this:

ID        LIST_COLUMN                                   NEW COLUMN 
id1       ['moviecritic', 'movie', 'disney']            moviecritic
id2       ['musiccritic', 'Metallica', 'rock']          musiccritic
id3       ['bookcritic', 'young adult', 'literature']   bookcritic

I've tried following, but it returns a ValueError (Length of values (0) does not match length of index (3))

import pandas as pd
data = [['id1', ['moviecritic', 'movie', 'disney']],['id2',['musiccritic', 'Metallica', 'rock']],
        ['id3',['bookcritic', 'young adult', 'literature']]]
df = pd.DataFrame(data,columns=['ID','LIST_COLUMN'])
        
        
df['NEW_COLUMN'] = [x for x in row.df['LIST_COLUMN'] if 'critic' in x]

Upvotes: 2

Views: 53

Answers (2)

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

You're very close, but another loop is needed to traverse the lists themselves, which we can do with next and short-circuit:

df["NEW_COLUMN"] = [next(seq for seq in a_list if "critic" in seq)
                    for a_list in df["LIST_COLUMN"]]

to get

    ID                            LIST_COLUMN   NEW_COLUMN
0  id1           [moviecritic, movie, disney]  moviecritic
1  id2         [musiccritic, Metallica, rock]  musiccritic
2  id3  [bookcritic, young adult, literature]   bookcritic

To provide a default value in case of no match in a row, you can change the next as:

next((seq for seq in a_list if "critic" in seq), None)

Upvotes: 3

ALollz
ALollz

Reputation: 59519

You can explode the list then filter to the values you want and assign back (it will align on Index, which explode preserves). Since you will never have more than 1 value with 'critic' in it this will work.

s = df['LIST_COLUMN'].explode()
df['new_column'] = s[s.str.contains('critic')]

    ID                            LIST_COLUMN   new_column
0  id1           [moviecritic, movie, disney]  moviecritic
1  id2         [musiccritic, Metallica, rock]  musiccritic
2  id3  [bookcritic, young adult, literature]   bookcritic

If possibly multiple matches, you will need to somehow aggregate along the original Index, for instance you can str.join:

df['new_column'] = s[s.str.`contains('critic')].groupby(level=0).agg(','.join)

Upvotes: 3

Related Questions