Reputation: 5234
EDIT MADE:
I have the 'Activity' column filled with strings and I want to derive the values in the 'Activity_2' column using an if statement.
So Activity_2 shows the desired result. Essentially I want to call out what type of activity is occurring.
I tried to do this using my code below but it won't run (please see screen shot below for error). Any help is greatly appreciated!
for i in df2['Activity']:
if i contains 'email':
df2['Activity_2'] = 'email'
elif i contains 'conference'
df2['Activity_2'] = 'conference'
elif i contains 'call'
df2['Activity_2'] = 'call'
else:
df2['Activity_2'] = 'task'
Error: if i contains 'email':
^
SyntaxError: invalid syntax
Upvotes: 29
Views: 136170
Reputation: 757
Another solution can be found in a post made by @unutbu. This also works great for creating conditional columns. I changed the example from that post df['Set'] == Z
to match your question to df['Activity'].str.contains('yourtext')
. See an example below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call foo']})
conditions = [
df['Activity'].str.contains('email'),
df['Activity'].str.contains('conference'),
df['Activity'].str.contains('call')]
values = ['email', 'conference', 'call']
df['Activity_2'] = np.select(conditions, values, default='task')
print(df)
You can find the original post here: Pandas conditional creation of a series/dataframe column
Upvotes: 1
Reputation: 1132
DEFAULT_ACTIVITY = 'task'
def assign_activity(todo_item):
"""Assign activity to raw text TODOs
"""
activities = ['email', 'conference', 'call']
for activity in activities:
if activity in todo_item:
return activity
else:
# Default value
return DEFAULT_ACTIVITY
df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call Charly'],
'Colleague': ['Knor', 'Koen', 'Hedge']})
# You should really come up with a better name than 'Activity_2', like 'Labels' or something.
df["Activity_2] = df["Activity"].apply(assign_activity)
Upvotes: 2
Reputation: 1026
The current solution behaves wrongly if your df contains NaN values. In that case I recommend using the following code which worked for me
temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
pd.np.where(temp.str.contains("email"), "email",
pd.np.where(temp.str.contains("conference"), "conference",
pd.np.where(temp.str.contains("call"), "call", "task"))))
Upvotes: 12
Reputation: 2139
This also works:
df.loc[df['Activity'].str.contains('email'), 'Activity_2'] = 'email'
df.loc[df['Activity'].str.contains('conference'), 'Activity_2'] = 'conference'
df.loc[df['Activity'].str.contains('call'), 'Activity_2'] = 'call'
Upvotes: 14
Reputation: 3409
you have an invalid syntax for checking strings.
try using
for i in df2['Activity']:
if 'email' in i :
df2['Activity_2'] = 'email'
Upvotes: 3
Reputation: 214957
I assume you are using pandas
, then you can use numpy.where
, which is a vectorized version of if/else, with the condition constructed by str.contains
:
df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
pd.np.where(df.Activity.str.contains("conference"), "conference",
pd.np.where(df.Activity.str.contains("call"), "call", "task")))
df
# Activity Activity_2
#0 email personA email
#1 attend conference conference
#2 send email email
#3 call Sam call
#4 random text task
#5 random text task
#6 lwantto call call
Upvotes: 39