Reputation: 471
Update: I have a large pandas dataframe with admitTime, dischargeTime, pat_name, pat_rec and it has around 5 million records. I am trying to forward fill the columns dischargeTime, pat_name, based on the dischargeTime datetime value for rest of the columns and break after that.
df:
admitTime dischargeTime pat_name pat_rec
2013-12-23 20:20:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:00:30
2013-12-23 21:01:00
2013-12-23 21:01:30
2013-12-23 21:02:00
2013-12-23 21:02:30
2013-12-23 21:03:00
2013-12-23 21:03:30
2013-12-23 21:04:00
2013-12-23 21:04:30
2013-12-23 21:05:00
2013-12-23 21:05:30
2013-12-23 21:06:00
2013-12-23 21:06:30
2013-12-23 21:07:00
2013-12-23 21:07:30
2013-12-23 21:08:00
2013-12-23 21:08:30
2013-12-23 21:09:00
2013-12-23 21:09:30
2013-12-23 21:10:00
2013-12-23 21:10:30
2013-12-23 21:11:00
2013-12-23 21:11:30
2013-12-23 21:12:00
2013-12-23 21:12:30
2013-12-23 21:13:00
2013-12-23 21:13:30
2013-12-23 21:14:00 2013-12-21:18:00 Sam A4523
2013-12-23 21:14:30
2013-12-23 21:15:00
2013-12-23 21:15:30
2013-12-23 21:16:00
2013-12-23 21:16:30
2013-12-23 21:17:00
2013-12-23 21:17:30
2013-12-23 21:18:00
2013-12-23 21:18:30
2013-12-23 21:19:00
2013-12-23 21:19:30
2013-12-23 21:20:00
Ideally I'd like my df to look like
datetime discchargeTime pat_name pat_rec
2013-12-23 20:20:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:00:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:01:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:01:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:02:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:02:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:03:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:03:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:04:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:04:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:05:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:05:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:06:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:06:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:07:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:07:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:08:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:08:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:09:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:09:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:10:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:10:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:11:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:11:30 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:12:00 2013-12-23 21:12:00 Alex A4536
2013-12-23 21:12:30
2013-12-23 21:13:00
2013-12-23 21:13:30
2013-12-23 21:14:00 2013-12-21:18:00 Sam A4523
2013-12-23 21:14:30 2013-12-21:18:00 Sam A4523
2013-12-23 21:15:00 2013-12-21:18:00 Sam A4523
2013-12-23 21:15:30 2013-12-21:18:00 Sam A4523
2013-12-23 21:16:00 2013-12-21:18:00 Sam A4523
2013-12-23 21:16:30 2013-12-21:18:00 Sam A4523
2013-12-23 21:17:00 2013-12-21:18:00 Sam A4523
2013-12-23 21:17:30 2013-12-21:18:00 Sam A4523
2013-12-23 21:18:00 2013-12-21:18:00 Sam A4523
2013-12-23 21:18:30
2013-12-23 21:19:00
2013-12-23 21:19:30
2013-12-23 21:20:00
I tried df[column_name].ffill()
but later realized its not the right thing to do.
I would really appreciate if I can get any suggestions.
Upvotes: 3
Views: 940
Reputation: 75080
You could use the below :
mask = df['admitTime'] > df['dischargeTime'].iloc[0] #masking where admit time is greater than discharge time
pd.concat([df[~mask].ffill(),df[mask]]) #ffill the remaining and concat with mask
admitTime dischargeTime pat_name pat_rec
0 2013-12-23 20:20:30 2013-12-23 21:12:00 Alex A4536
1 2013-12-23 21:00:30 2013-12-23 21:12:00 Alex A4536
2 2013-12-23 21:01:00 2013-12-23 21:12:00 Alex A4536
3 2013-12-23 21:01:30 2013-12-23 21:12:00 Alex A4536
4 2013-12-23 21:02:00 2013-12-23 21:12:00 Alex A4536
5 2013-12-23 21:02:30 2013-12-23 21:12:00 Alex A4536
6 2013-12-23 21:03:00 2013-12-23 21:12:00 Alex A4536
7 2013-12-23 21:03:30 2013-12-23 21:12:00 Alex A4536
8 2013-12-23 21:04:00 2013-12-23 21:12:00 Alex A4536
9 2013-12-23 21:04:30 2013-12-23 21:12:00 Alex A4536
10 2013-12-23 21:05:00 2013-12-23 21:12:00 Alex A4536
11 2013-12-23 21:05:30 2013-12-23 21:12:00 Alex A4536
12 2013-12-23 21:06:00 2013-12-23 21:12:00 Alex A4536
13 2013-12-23 21:06:30 2013-12-23 21:12:00 Alex A4536
14 2013-12-23 21:07:00 2013-12-23 21:12:00 Alex A4536
15 2013-12-23 21:07:30 2013-12-23 21:12:00 Alex A4536
16 2013-12-23 21:08:00 2013-12-23 21:12:00 Alex A4536
17 2013-12-23 21:08:30 2013-12-23 21:12:00 Alex A4536
18 2013-12-23 21:09:00 2013-12-23 21:12:00 Alex A4536
19 2013-12-23 21:09:30 2013-12-23 21:12:00 Alex A4536
20 2013-12-23 21:10:00 2013-12-23 21:12:00 Alex A4536
21 2013-12-23 21:10:30 2013-12-23 21:12:00 Alex A4536
22 2013-12-23 21:11:00 2013-12-23 21:12:00 Alex A4536
23 2013-12-23 21:11:30 2013-12-23 21:12:00 Alex A4536
24 2013-12-23 21:12:00 2013-12-23 21:12:00 Alex A4536
25 2013-12-23 21:12:30 NaT NaN NaN
26 2013-12-23 21:13:00 NaT NaN NaN
................
................
You can then replace the nan with space if you want. Hope this helps.
Upvotes: 2