Reputation: 131
I'm trying to figure out how to create a new column with values from another column where the column type is a list. I'm struggeling to select only specific values from the said list and not the entire list if the value meets the condition.
More specific I want to create a new column with the values that contain 'critic'. So the result should look like this:
ID LIST_COLUMN NEW COLUMN
id1 ['moviecritic', 'movie', 'disney'] moviecritic
id2 ['musiccritic', 'Metallica', 'rock'] musiccritic
id3 ['bookcritic', 'young adult', 'literature'] bookcritic
I've tried following, but it returns a ValueError (Length of values (0) does not match length of index (3))
import pandas as pd
data = [['id1', ['moviecritic', 'movie', 'disney']],['id2',['musiccritic', 'Metallica', 'rock']],
['id3',['bookcritic', 'young adult', 'literature']]]
df = pd.DataFrame(data,columns=['ID','LIST_COLUMN'])
df['NEW_COLUMN'] = [x for x in row.df['LIST_COLUMN'] if 'critic' in x]
Upvotes: 2
Views: 53
Reputation: 18306
You're very close, but another loop is needed to traverse the lists themselves, which we can do with next
and short-circuit:
df["NEW_COLUMN"] = [next(seq for seq in a_list if "critic" in seq)
for a_list in df["LIST_COLUMN"]]
to get
ID LIST_COLUMN NEW_COLUMN
0 id1 [moviecritic, movie, disney] moviecritic
1 id2 [musiccritic, Metallica, rock] musiccritic
2 id3 [bookcritic, young adult, literature] bookcritic
To provide a default value in case of no match in a row, you can change the next
as:
next((seq for seq in a_list if "critic" in seq), None)
Upvotes: 3
Reputation: 59519
You can explode
the list then filter to the values you want and assign back (it will align on Index, which explode
preserves). Since you will never have more than 1 value with 'critic' in it this will work.
s = df['LIST_COLUMN'].explode()
df['new_column'] = s[s.str.contains('critic')]
ID LIST_COLUMN new_column
0 id1 [moviecritic, movie, disney] moviecritic
1 id2 [musiccritic, Metallica, rock] musiccritic
2 id3 [bookcritic, young adult, literature] bookcritic
If possibly multiple matches, you will need to somehow aggregate along the original Index, for instance you can str.join
:
df['new_column'] = s[s.str.`contains('critic')].groupby(level=0).agg(','.join)
Upvotes: 3