Danish
Danish

Reputation: 2871

create a new columns by adding minutes to date time column and another column by groupby row number - in Pandas

I have a data frame as shown below

session          appt_time             
s1               2020-01-06 09:00:00   
s1               2020-01-06 09:20:00   
s1               2020-01-06 09:40:00   
s1               2020-01-06 09:40:00
s1               2020-01-06 10:00:00
s1               2020-01-06 10:00:00
s2               2020-01-06 08:20:00
s2               2020-01-06 08:40:00
s2               2020-01-06 08:40:00
s2               2020-01-06 09:00:00
s2               2020-01-06 09:20:00
s2               2020-01-06 09:40:00
s2               2020-01-06 09:40:00
s2               2020-01-06 10:00:00
s3               2020-01-09 13:00:00
s3               2020-01-09 13:00:00
s3               2020-01-09 13:20:00
s3               2020-01-09 13:40:00

From the above I would like to create a new columns called ideal_appt_time and slot_num as shown below.

session          appt_time              ideal_appt_time         slot_num
s1               2020-01-06 09:00:00    2020-01-06 09:00:00     1
s1               2020-01-06 09:20:00    2020-01-06 09:20:00     2
s1               2020-01-06 09:40:00    2020-01-06 09:40:00     3
s1               2020-01-06 09:40:00    2020-01-06 10:00:00     4
s1               2020-01-06 10:00:00    2020-01-06 10:20:00     5
s1               2020-01-06 10:00:00    2020-01-06 10:40:00     6 
s2               2020-01-06 08:20:00    2020-01-06 08:20:00     1
s2               2020-01-06 08:40:00    2020-01-06 08:40:00     2
s2               2020-01-06 08:40:00    2020-01-06 09:00:00     3
s2               2020-01-06 09:00:00    2020-01-06 09:20:00     4
s2               2020-01-06 09:20:00    2020-01-06 09:40:00     5
s2               2020-01-06 09:40:00    2020-01-06 10:00:00     6
s2               2020-01-06 09:40:00    2020-01-06 10:20:00     7
s2               2020-01-06 10:00:00    2020-01-06 10:40:00     8
s3               2020-01-09 13:00:00    2020-01-09 13:00:00     1
s3               2020-01-09 13:00:00    2020-01-09 13:20:00     2
s3               2020-01-09 13:20:00    2020-01-09 13:40:00     3
s3               2020-01-09 13:40:00    2020-01-09 14:00:00     4

Explanation:

where ideal_appt_time is calculated based on appt_time, start of ideal_appt_time is same as appt_time. then adding 20 minutes to that, where as in appt_time some appt_time are repeating.

slot_num just counted the slot of that session based on appoitment time.

Upvotes: 1

Views: 31

Answers (1)

jezrael
jezrael

Reputation: 862511

Use GroupBy.cumcount for counter Series, converted to timedeltas by to_timedelta and multiple 20 for 20 Minutes.

Then get first timestamp per group by GroupBy.transform and GroupBy.first, add timedeltas and last for counter column add 1:

df['appt_time'] = pd.to_datetime(df['appt_time'])

counts = df.groupby('session').cumcount()  
td = pd.to_timedelta(counts, unit='Min') * 20
df['ideal_appt_time'] = df.groupby('session')['appt_time'].transform('first') + td
df['slot_num'] = counts + 1
print (df)
   session           appt_time     ideal_appt_time  slot_num
0       s1 2020-01-06 09:00:00 2020-01-06 09:00:00         1
1       s1 2020-01-06 09:20:00 2020-01-06 09:20:00         2
2       s1 2020-01-06 09:40:00 2020-01-06 09:40:00         3
3       s1 2020-01-06 09:40:00 2020-01-06 10:00:00         4
4       s1 2020-01-06 10:00:00 2020-01-06 10:20:00         5
5       s1 2020-01-06 10:00:00 2020-01-06 10:40:00         6
6       s2 2020-01-06 08:20:00 2020-01-06 08:20:00         1
7       s2 2020-01-06 08:40:00 2020-01-06 08:40:00         2
8       s2 2020-01-06 08:40:00 2020-01-06 09:00:00         3
9       s2 2020-01-06 09:00:00 2020-01-06 09:20:00         4
10      s2 2020-01-06 09:20:00 2020-01-06 09:40:00         5
11      s2 2020-01-06 09:40:00 2020-01-06 10:00:00         6
12      s2 2020-01-06 09:40:00 2020-01-06 10:20:00         7
13      s2 2020-01-06 10:00:00 2020-01-06 10:40:00         8
14      s3 2020-01-09 13:00:00 2020-01-09 13:00:00         1
15      s3 2020-01-09 13:00:00 2020-01-09 13:20:00         2
16      s3 2020-01-09 13:20:00 2020-01-09 13:40:00         3
17      s3 2020-01-09 13:40:00 2020-01-09 14:00:00         4

Upvotes: 1

Related Questions