PKN
PKN

Reputation: 25

To create condition based column using Python pandas library

I am trying to put together the following data frame in pandas. Initially I have 3 columns. I want to create a new column called "Status" based on condition, if Days_between > 7 then "New course" else "Existing Course" for particular ID

I tried groupby with if else condition but could not put the logic correct. Any help would be appreciated.

ID Date days_between Status
1 8/21/2014 0 Existing Course
1 8/22/2014 1 Existing Course
1 8/25/2014 3 Existing Course
1 8/26/2014 1 Existing Course
1 8/27/2014 1 Existing Course
1 8/28/2014 1 Existing Course
1 8/29/2014 1 Existing Course
1 9/2/2014 4 Existing Course
1 9/2/2014 0 Existing Course
1 9/3/2014 1 Existing Course
1 9/4/2014 1 Existing Course
1 9/5/2014 1 Existing Course
1 9/8/2014 3 Existing Course
1 9/9/2014 1 Existing Course
1 9/10/2014 1 Existing Course
1 9/11/2014 1 Existing Course
1 9/12/2014 1 Existing Course
1 9/15/2014 3 Existing Course
1 9/16/2014 1 Existing Course
1 9/18/2014 2 Existing Course
1 9/18/2014 0 Existing Course
1 9/19/2014 1 Existing Course
1 9/22/2014 3 Existing Course
1 9/23/2014 1 Existing Course
1 9/25/2014 2 Existing Course
1 9/25/2014 0 Existing Course
1 9/29/2014 4 Existing Course
1 9/30/2014 1 Existing Course
1 10/1/2014 1 Existing Course
1 10/2/2014 1 Existing Course
1 3/9/2016 524 New Course
1 3/10/2016 1 Existing Course
1 3/11/2016 1 Existing Course
1 3/14/2016 3 Existing Course
1 3/16/2016 2 Existing Course
1 3/18/2016 2 Existing Course
1 3/21/2016 3 Existing Course
2 9/15/2014 0 Existing Course
2 9/16/2014 1 Existing Course
2 9/17/2014 1 Existing Course
2 9/18/2014 1 Existing Course
2 9/18/2014 0 Existing Course
2 9/19/2014 1 Existing Course
2 9/19/2014 0 Existing Course
2 9/24/2014 5 Existing Course
2 9/25/2014 1 Existing Course
2 9/29/2014 4 Existing Course
2 9/30/2014 1 Existing Course
2 10/1/2014 1 Existing Course
2 10/2/2014 1 Existing Course
2 10/3/2014 1 Existing Course
2 10/6/2014 3 Existing Course
2 10/7/2014 1 Existing Course
2 10/9/2014 2 Existing Course

Upvotes: 1

Views: 39

Answers (3)

jrbergen
jrbergen

Reputation: 658

This probably isn't the most efficient or canonical way to do this (I think using something akin a boolean mask would be more efficient) but this should work:

df['Status'] = df.apply(lambda row: 'New Course' if row['days_between'] > 7 else 'Existing Course', axis=1)

The lambda function is applied on a per-row basis.

Upvotes: 0

user5386938
user5386938

Reputation:

Slightly different take...

import pandas as pd
from io import StringIO

data = '''\
ID  Date    days_between
1   8/21/2014   0
1   8/22/2014   8
1   8/25/2014   3
1   8/26/2014   9

'''

f = StringIO(data)
df = pd.read_csv(f, sep='\t')

print(df)

df['Status'] = df.apply(lambda r: 'New course'
                        if r.days_between > 7 else 'Existing Course',
                        axis=1)

print(df)

Upvotes: 0

Joe Thor
Joe Thor

Reputation: 1260

Try applying the logic via lambda function


df['status'] = df['days_between'].apply(lambda x: 'New course' if x > 7 else 'Existing Course')

Upvotes: 1

Related Questions