Reputation: 25
I am trying to put together the following data frame in pandas. Initially I have 3 columns. I want to create a new column called "Status" based on condition, if Days_between > 7 then "New course" else "Existing Course" for particular ID
I tried groupby with if else condition but could not put the logic correct. Any help would be appreciated.
ID | Date | days_between | Status |
---|---|---|---|
1 | 8/21/2014 | 0 | Existing Course |
1 | 8/22/2014 | 1 | Existing Course |
1 | 8/25/2014 | 3 | Existing Course |
1 | 8/26/2014 | 1 | Existing Course |
1 | 8/27/2014 | 1 | Existing Course |
1 | 8/28/2014 | 1 | Existing Course |
1 | 8/29/2014 | 1 | Existing Course |
1 | 9/2/2014 | 4 | Existing Course |
1 | 9/2/2014 | 0 | Existing Course |
1 | 9/3/2014 | 1 | Existing Course |
1 | 9/4/2014 | 1 | Existing Course |
1 | 9/5/2014 | 1 | Existing Course |
1 | 9/8/2014 | 3 | Existing Course |
1 | 9/9/2014 | 1 | Existing Course |
1 | 9/10/2014 | 1 | Existing Course |
1 | 9/11/2014 | 1 | Existing Course |
1 | 9/12/2014 | 1 | Existing Course |
1 | 9/15/2014 | 3 | Existing Course |
1 | 9/16/2014 | 1 | Existing Course |
1 | 9/18/2014 | 2 | Existing Course |
1 | 9/18/2014 | 0 | Existing Course |
1 | 9/19/2014 | 1 | Existing Course |
1 | 9/22/2014 | 3 | Existing Course |
1 | 9/23/2014 | 1 | Existing Course |
1 | 9/25/2014 | 2 | Existing Course |
1 | 9/25/2014 | 0 | Existing Course |
1 | 9/29/2014 | 4 | Existing Course |
1 | 9/30/2014 | 1 | Existing Course |
1 | 10/1/2014 | 1 | Existing Course |
1 | 10/2/2014 | 1 | Existing Course |
1 | 3/9/2016 | 524 | New Course |
1 | 3/10/2016 | 1 | Existing Course |
1 | 3/11/2016 | 1 | Existing Course |
1 | 3/14/2016 | 3 | Existing Course |
1 | 3/16/2016 | 2 | Existing Course |
1 | 3/18/2016 | 2 | Existing Course |
1 | 3/21/2016 | 3 | Existing Course |
2 | 9/15/2014 | 0 | Existing Course |
2 | 9/16/2014 | 1 | Existing Course |
2 | 9/17/2014 | 1 | Existing Course |
2 | 9/18/2014 | 1 | Existing Course |
2 | 9/18/2014 | 0 | Existing Course |
2 | 9/19/2014 | 1 | Existing Course |
2 | 9/19/2014 | 0 | Existing Course |
2 | 9/24/2014 | 5 | Existing Course |
2 | 9/25/2014 | 1 | Existing Course |
2 | 9/29/2014 | 4 | Existing Course |
2 | 9/30/2014 | 1 | Existing Course |
2 | 10/1/2014 | 1 | Existing Course |
2 | 10/2/2014 | 1 | Existing Course |
2 | 10/3/2014 | 1 | Existing Course |
2 | 10/6/2014 | 3 | Existing Course |
2 | 10/7/2014 | 1 | Existing Course |
2 | 10/9/2014 | 2 | Existing Course |
Upvotes: 1
Views: 39
Reputation: 658
This probably isn't the most efficient or canonical way to do this (I think using something akin a boolean mask would be more efficient) but this should work:
df['Status'] = df.apply(lambda row: 'New Course' if row['days_between'] > 7 else 'Existing Course', axis=1)
The lambda function is applied on a per-row basis.
Upvotes: 0
Reputation:
Slightly different take...
import pandas as pd
from io import StringIO
data = '''\
ID Date days_between
1 8/21/2014 0
1 8/22/2014 8
1 8/25/2014 3
1 8/26/2014 9
'''
f = StringIO(data)
df = pd.read_csv(f, sep='\t')
print(df)
df['Status'] = df.apply(lambda r: 'New course'
if r.days_between > 7 else 'Existing Course',
axis=1)
print(df)
Upvotes: 0
Reputation: 1260
Try applying the logic via lambda function
df['status'] = df['days_between'].apply(lambda x: 'New course' if x > 7 else 'Existing Course')
Upvotes: 1