PineNuts0
PineNuts0

Reputation: 5234

Conditional If Statement: If value in row contains string ... set another column equal to string

EDIT MADE:

I have the 'Activity' column filled with strings and I want to derive the values in the 'Activity_2' column using an if statement.

So Activity_2 shows the desired result. Essentially I want to call out what type of activity is occurring.

I tried to do this using my code below but it won't run (please see screen shot below for error). Any help is greatly appreciated!

enter image description here

    for i in df2['Activity']:
        if i contains 'email':
            df2['Activity_2'] = 'email'
        elif i contains 'conference'
            df2['Activity_2'] = 'conference'
        elif i contains 'call'
            df2['Activity_2'] = 'call'
        else:
            df2['Activity_2'] = 'task'


Error: if i contains 'email':
                ^
SyntaxError: invalid syntax

Upvotes: 29

Views: 136170

Answers (6)

Hedge92
Hedge92

Reputation: 757

Another solution can be found in a post made by @unutbu. This also works great for creating conditional columns. I changed the example from that post df['Set'] == Z to match your question to df['Activity'].str.contains('yourtext'). See an example below:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call foo']})

conditions = [
    df['Activity'].str.contains('email'),
    df['Activity'].str.contains('conference'),
    df['Activity'].str.contains('call')]

values = ['email', 'conference', 'call']

df['Activity_2'] = np.select(conditions, values, default='task')

print(df)

You can find the original post here: Pandas conditional creation of a series/dataframe column

Upvotes: 1

Dave Liu
Dave Liu

Reputation: 1132

  1. Your code had bugs- no colons on "elif" lines.
  2. You didn't mention you were using Pandas, but that's the assumption I'm going with.
  3. My answer handles defaults, uses proper Python conventions, is the most efficient, up-to-date, and easily adaptable for additional activities.

DEFAULT_ACTIVITY = 'task'


def assign_activity(todo_item):
    """Assign activity to raw text TODOs
    """
    activities = ['email', 'conference', 'call']

    for activity in activities:
        if activity in todo_item:
            return activity
        else:
            # Default value
            return DEFAULT_ACTIVITY

df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call Charly'],
                   'Colleague': ['Knor', 'Koen', 'Hedge']})

# You should really come up with a better name than 'Activity_2', like 'Labels' or something.
df["Activity_2] = df["Activity"].apply(assign_activity)

Upvotes: 2

DovaX
DovaX

Reputation: 1026

The current solution behaves wrongly if your df contains NaN values. In that case I recommend using the following code which worked for me

temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
                   pd.np.where(temp.str.contains("email"), "email",
                   pd.np.where(temp.str.contains("conference"), "conference",
                   pd.np.where(temp.str.contains("call"), "call", "task"))))

Upvotes: 12

moshfiqur
moshfiqur

Reputation: 2139

This also works:

df.loc[df['Activity'].str.contains('email'), 'Activity_2'] = 'email'
df.loc[df['Activity'].str.contains('conference'), 'Activity_2'] = 'conference'
df.loc[df['Activity'].str.contains('call'), 'Activity_2'] = 'call'

Upvotes: 14

Prakash Palnati
Prakash Palnati

Reputation: 3409

you have an invalid syntax for checking strings.

try using

 for i in df2['Activity']:
        if 'email' in i :
            df2['Activity_2'] = 'email'

Upvotes: 3

akuiper
akuiper

Reputation: 214957

I assume you are using pandas, then you can use numpy.where, which is a vectorized version of if/else, with the condition constructed by str.contains:

df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
                   pd.np.where(df.Activity.str.contains("conference"), "conference",
                   pd.np.where(df.Activity.str.contains("call"), "call", "task")))

df

#   Activity            Activity_2
#0  email personA       email
#1  attend conference   conference
#2  send email          email
#3  call Sam            call
#4  random text         task
#5  random text         task
#6  lwantto call        call

Upvotes: 39

Related Questions