I am trying to put together the following data frame in pandas. Initially I have 3 columns. I want to create a new column called "Status" based on condition, if Days_between > 7 then "New course" else "Existing Course" for particular ID I tried groupby with if else condition but could not put the logic correct. Any help would be appreciated. ID Date days_between Status 1 8/21/2014 0 Existing Course 1 8/22/2014 1 Existing Course 1 8/25/2014 3 Existing Course 1 8/26/2014 1 Existing Course 1 8/27/2014 1 Existing Course 1 8/28/2014 1 Existing Course 1 8/29/2014 1 Existing Course 1 9/2/2014 4 Existing Course 1 9/2/2014 0 Existing Course 1 9/3/2014 1 Existing Course 1 9/4/2014 1 Existing Course 1 9/5/2014 1 Existing Course 1 9/8/2014 3 Existing Course 1 9/9/2014 1 Existing Course 1 9/10/2014 1 Existing Course 1 9/11/2014 1 Existing Course 1 9/12/2014 1 Existing Course 1 9/15/2014 3 Existing Course 1 9/16/2014 1 Existing Course 1 9/18/2014 2 Existing Course 1 9/18/2014 0 Existing Course 1 9/19/2014 1 Existing Course 1 9/22/2014 3 Existing Course 1 9/23/2014 1 Existing Course 1 9/25/2014 2 Existing Course 1 9/25/2014 0 Existing Course 1 9/29/2014 4 Existing Course 1 9/30/2014 1 Existing Course 1 10/1/2014 1 Existing Course 1 10/2/2014 1 Existing Course 1 3/9/2016 524 New Course 1 3/10/2016 1 Existing Course 1 3/11/2016 1 Existing Course 1 3/14/2016 3 Existing Course 1 3/16/2016 2 Existing Course 1 3/18/2016 2 Existing Course 1 3/21/2016 3 Existing Course 2 9/15/2014 0 Existing Course 2 9/16/2014 1 Existing Course 2 9/17/2014 1 Existing Course 2 9/18/2014 1 Existing Course 2 9/18/2014 0 Existing Course 2 9/19/2014 1 Existing Course 2 9/19/2014 0 Existing Course 2 9/24/2014 5 Existing Course 2 9/25/2014 1 Existing Course 2 9/29/2014 4 Existing Course 2 9/30/2014 1 Existing Course 2 10/1/2014 1 Existing Course 2 10/2/2014 1 Existing Course 2 10/3/2014 1 Existing Course 2 10/6/2014 3 Existing Course 2 10/7/2014 1 Existing Course 2 10/9/2014 2 Existing Course

Reputation: 25

To create condition based column using Python pandas library

I am trying to put together the following data frame in pandas. Initially I have 3 columns. I want to create a new column called "Status" based on condition, if Days_between > 7 then "New course" else "Existing Course" for particular ID

I tried groupby with if else condition but could not put the logic correct. Any help would be appreciated.

ID	Date	days_between	Status
1	8/21/2014	0	Existing Course
1	8/22/2014	1	Existing Course
1	8/25/2014	3	Existing Course
1	8/26/2014	1	Existing Course
1	8/27/2014	1	Existing Course
1	8/28/2014	1	Existing Course
1	8/29/2014	1	Existing Course
1	9/2/2014	4	Existing Course
1	9/2/2014	0	Existing Course
1	9/3/2014	1	Existing Course
1	9/4/2014	1	Existing Course
1	9/5/2014	1	Existing Course
1	9/8/2014	3	Existing Course
1	9/9/2014	1	Existing Course
1	9/10/2014	1	Existing Course
1	9/11/2014	1	Existing Course
1	9/12/2014	1	Existing Course
1	9/15/2014	3	Existing Course
1	9/16/2014	1	Existing Course
1	9/18/2014	2	Existing Course
1	9/18/2014	0	Existing Course
1	9/19/2014	1	Existing Course
1	9/22/2014	3	Existing Course
1	9/23/2014	1	Existing Course
1	9/25/2014	2	Existing Course
1	9/25/2014	0	Existing Course
1	9/29/2014	4	Existing Course
1	9/30/2014	1	Existing Course
1	10/1/2014	1	Existing Course
1	10/2/2014	1	Existing Course
1	3/9/2016	524	New Course
1	3/10/2016	1	Existing Course
1	3/11/2016	1	Existing Course
1	3/14/2016	3	Existing Course
1	3/16/2016	2	Existing Course
1	3/18/2016	2	Existing Course
1	3/21/2016	3	Existing Course
2	9/15/2014	0	Existing Course
2	9/16/2014	1	Existing Course
2	9/17/2014	1	Existing Course
2	9/18/2014	1	Existing Course
2	9/18/2014	0	Existing Course
2	9/19/2014	1	Existing Course
2	9/19/2014	0	Existing Course
2	9/24/2014	5	Existing Course
2	9/25/2014	1	Existing Course
2	9/29/2014	4	Existing Course
2	9/30/2014	1	Existing Course
2	10/1/2014	1	Existing Course
2	10/2/2014	1	Existing Course
2	10/3/2014	1	Existing Course
2	10/6/2014	3	Existing Course
2	10/7/2014	1	Existing Course
2	10/9/2014	2	Existing Course

Upvotes: 1

Answers (3)

jrbergen

Reputation: 658

This probably isn't the most efficient or canonical way to do this (I think using something akin a boolean mask would be more efficient) but this should work:

df['Status'] = df.apply(lambda row: 'New Course' if row['days_between'] > 7 else 'Existing Course', axis=1)

The lambda function is applied on a per-row basis.

Upvotes: 0

user5386938

Reputation:

Slightly different take...

import pandas as pd
from io import StringIO

data = '''\
ID  Date    days_between
1   8/21/2014   0
1   8/22/2014   8
1   8/25/2014   3
1   8/26/2014   9

'''

f = StringIO(data)
df = pd.read_csv(f, sep='\t')

print(df)

df['Status'] = df.apply(lambda r: 'New course'
                        if r.days_between > 7 else 'Existing Course',
                        axis=1)

print(df)

Upvotes: 0

Joe Thor

Reputation: 1260

Try applying the logic via lambda function


df['status'] = df['days_between'].apply(lambda x: 'New course' if x > 7 else 'Existing Course')

Upvotes: 1

To create condition based column using Python pandas library

Answers (3)

Related Questions