Reputation: 23
I want to create a new column of a string with a condition. Example:
from pandas import DataFrame
GoT = {'Old_Group': ['Jon Snow', 'Sansa Stark','Arya Stark','Robb Stark','Theon Greyjoy' ]}
df = DataFrame(GoT,columns=['Old_Group'])
The "New_Group" should check if the "Old_Group" contains the String "Stark" on whatever place and assign it to e.g. "Stark Family". If the condition "contains = 'Stark'" does not fit, then the "new_group" should be assigned to e.g. "other"
In SQL I would do it this way:
Select Old_Group
,case when Old_Group like '%Stark%' then 'Stark Family' else 'other' end as New_Group
from df
Thank you
Upvotes: 2
Views: 3172
Reputation: 79288
In case the name Stark
appears in a name like MacStark, then this option would leave it out. Also it is case insensitive
df.assign(New_Column=df.replace({r'(?i)^((?!\bStark\b).)*$':'Other',r'(?i)\bStark\b':'Stark Family'},regex=True))
Out[319]:
Old_Group New_Column
0 Jon Snow Other
1 Sansa Stark Sansa Stark Family
2 Arya Stark Arya Stark Family
3 Robb Stark Robb Stark Family
4 Theon Greyjoy Other
Upvotes: 1
Reputation: 1844
You can use a combination of np.where and str.contains to do this. Essentially what you cant to do is apply np.where on the column in question (Old Group in this case) and the check if the string contains the work Stark.
df['New Group'] = np.where(df['Old Group'].str.contains("Stark"), 'Stark Family', 'Other')
Just make sure your column New Group is a string data type and you have numpy imported as a package
Upvotes: 2
Reputation: 13426
You need:
df['New_Group'] = df['Old_Group'].apply(lambda x : 'Stark Family' if 'Stark' in x else 'other')
print(df)
Output
Old_Group New_Group
0 Jon Snow other
1 Sansa Stark Stark Family
2 Arya Stark Stark Family
3 Robb Stark Stark Family
4 Theon Greyjoy other
Upvotes: 0