Reputation: 363
I have a data frame, want to create a column based on the string in column1_sport.
import pandas as pd
df = pd.read_csv('C:/Users/test/dataframe.csv', encoding = 'iso-8859-1')
Data contains:
column1_sport
baseball
basketball
tennis
boxing
golf
I want to look for certain strings ("ball" or "box") and create a new column based on whether the column contains that word. If the dataframe doesn't contain that word, add "other". See below.
column1_sport column2_type
baseball ball
basketball ball
tennis other
boxing box
golf other
Upvotes: 3
Views: 12011
Reputation: 1475
df["column2_type"] = df.column1_sport.apply(lambda x: "ball" if "ball" in x else ("box" if "box" in x else "Other"))
df
column1_sport column2_type
0 baseball ball
1 basketball ball
2 tennis Other
3 boxing box
4 golf Other
Incase you have more complex conditions
def func(a):
if "ball" in a.lower():
return "ball"
elif "box" in a.lower():
return "box"
else:
return "Other"
df["column2_type"] = df.column1_sport.apply(lambda x: func(x))
Upvotes: 2
Reputation: 164653
For multiple conditions I suggest np.select
. For example:
values = ['ball', 'box']
conditions = list(map(df['column1_sport'].str.contains, values))
df['column2_type'] = np.select(conditions, values, 'other')
print(df)
# column1_sport column2_type
# 0 baseball ball
# 1 basketball ball
# 2 tennis other
# 3 boxing box
# 4 golf other
Upvotes: 5
Reputation: 38415
You can use a nested np.where
cond1 = df.column1_sport.str.contains('ball')
cond2 = df.column1_sport.str.contains('box')
df['column2_type'] = np.where(cond1, 'ball', np.where(cond2, 'box', 'other') )
column1_sport column2_type
0 baseball ball
1 basketball ball
2 tennis other
3 boxing box
4 golf other
Upvotes: 0