Reputation: 2810
i have a following DataFrame which looks something like this
+------+------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
| Id | B_Id | Start | End | Day | Week_day | f | LoginTime | LogoutTime |
+------+------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
| 4960 | 2213 | 2019-11-06 16:21:25.722382200 | 2019-11-06 16:50:32.219562100 | 6 | 2 | 00:44:07.348063 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4967 | NaN | 2019-11-06 17:05:45.411147300 | 2019-11-06 18:09:32.823322100 | 6 | 2 | 00:15:13.191585 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4973 | 2234 | 2019-11-06 18:58:11.333021900 | 2019-11-06 18:58:44.839433600 | 6 | 2 | 00:48:38.509699 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 5131 | NaN | 2019-11-11 17:50:06.068008900 | 2019-11-11 18:22:28.992619600 | 11 | 0 | 00:03:45.366140 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5135 | NaN | 2019-11-11 18:55:06.819281700 | 2019-11-11 19:19:47.855423700 | 11 | 0 | 00:32:37.826662 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5138 | NaN | 2019-11-11 19:45:12.931469700 | 2019-11-11 20:00:22.358470600 | 11 | 0 | 00:25:25.076046 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5139 | NaN | 2019-11-11 20:13:29.918047600 | 2019-11-11 20:53:23.214166300 | 11 | 0 | 00:13:07.559577 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5142 | NaN | 2019-11-11 21:41:48.524924200 | 2019-11-11 21:49:10.308323100 | 11 | 0 | 00:48:25.310757 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5144 | NaN | 2019-11-11 22:00:51.106021500 | 2019-11-11 22:19:48.857907400 | 11 | 0 | 00:11:40.797698 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5145 | NaN | 2019-11-11 22:30:14.077789100 | 2019-11-11 22:56:42.090168300 | 11 | 0 | 00:10:25.219881 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5225 | NaN | 2019-11-14 14:14:08.879589200 | 2019-11-14 14:52:40.396641500 | 14 | 3 | 01:06:08.250563 | 2019-11-14 13:08:00.629026000 | 2019-11-14 23:43:31.860982200 |
+------+------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
I want to append row to the end of each day containing difference of logOut time and last End date eg: 2019-11-06 19:11:52.483390400 - 2019-11-06 18:58:44.839433600
so,the final output looks something like this:
+------+-----------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
| Id | B_Id | Start | End | Day | Week_day | f | LoginTime | LogoutTime |
+------+-----------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
| 4960 | 2213 | 2019-11-06 16:21:25.722382200 | 2019-11-06 16:50:32.219562100 | 6 | 2 | 00:44:07.348063 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4967 | NaN | 2019-11-06 17:05:45.411147300 | 2019-11-06 18:09:32.823322100 | 6 | 2 | 00:15:13.191585 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4973 | 2234 | 2019-11-06 18:58:11.333021900 | 2019-11-06 18:58:44.839433600 | 6 | 2 | 00:48:38.509699 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| nan | NaN | nat | nat | 6 | 2 | 00:13:08.509699 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 5131 | NaN | 2019-11-11 17:50:06.068008900 | 2019-11-11 18:22:28.992619600 | 11 | 0 | 00:03:45.366140 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5135 | NaN | 2019-11-11 18:55:06.819281700 | 2019-11-11 19:19:47.855423700 | 11 | 0 | 00:32:37.826662 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5138 | NaN | 2019-11-11 19:45:12.931469700 | 2019-11-11 20:00:22.358470600 | 11 | 0 | 00:25:25.076046 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5139 | NaN | 2019-11-11 20:13:29.918047600 | 2019-11-11 20:53:23.214166300 | 11 | 0 | 00:13:07.559577 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5142 | NaN | 2019-11-11 21:41:48.524924200 | 2019-11-11 21:49:10.308323100 | 11 | 0 | 00:48:25.310757 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5144 | NaN | 2019-11-11 22:00:51.106021500 | 2019-11-11 22:19:48.857907400 | 11 | 0 | 00:11:40.797698 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5145 | NaN | 2019-11-11 22:30:14.077789100 | 2019-11-11 22:56:42.090168300 | 11 | 0 | 00:10:25.219881 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| nan | NaN | nat | nat | 11 | 0 | 00:13:13.219881 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
+------+-----------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
I am only interested in f column rest can have NaN and nulls etc or may be duplicates. Any guidance will be appreciated.
Upvotes: 2
Views: 66
Reputation: 862441
Use GroupBy.last
by dates of some column, e.g. LoginTime
, then set column f
by subtract and another columns to missing values, DataFrame.append
to original and sorting by some columns like LoginTime
and Start
:
#if necessary convert to timedeltas
df['f'] = pd.to_timedelta(df['f'])
print (df.dtypes)
Id int64
B_Id float64
Start datetime64[ns]
End datetime64[ns]
Day int64
Week_day int64
f timedelta64[ns]
LoginTime datetime64[ns]
LogoutTime datetime64[ns]
dtype: object
df1 = df.groupby(df['LoginTime'].dt.date).last()
df1 = (df1.assign(f = df1['LogoutTime'].sub(df1['End']),
Id = np.nan,
B_Id = np.nan,
Start = np.nan,
End = np.nan)
.append(df)
.sort_values(['LoginTime', 'Start'])
.reset_index(drop=True))
print (df1)
Id B_Id Start \
0 4960.0 2213.0 2019-11-06 16:21:25.722382200
1 4967.0 NaN 2019-11-06 17:05:45.411147300
2 4973.0 2234.0 2019-11-06 18:58:11.333021900
3 NaN NaN NaT
4 5131.0 NaN 2019-11-11 17:50:06.068008900
5 5135.0 NaN 2019-11-11 18:55:06.819281700
6 5138.0 NaN 2019-11-11 19:45:12.931469700
7 5139.0 NaN 2019-11-11 20:13:29.918047600
8 5142.0 NaN 2019-11-11 21:41:48.524924200
9 5144.0 NaN 2019-11-11 22:00:51.106021500
10 5145.0 NaN 2019-11-11 22:30:14.077789100
11 NaN NaN NaT
12 5225.0 NaN 2019-11-14 14:14:08.879589200
13 NaN NaN NaT
End Day Week_day f \
0 2019-11-06 16:50:32.219562100 6 2 00:44:07.348063
1 2019-11-06 18:09:32.823322100 6 2 00:15:13.191585
2 2019-11-06 18:58:44.839433600 6 2 00:48:38.509699
3 NaT 6 2 00:13:07.643956
4 2019-11-11 18:22:28.992619600 11 0 00:03:45.366140
5 2019-11-11 19:19:47.855423700 11 0 00:32:37.826662
6 2019-11-11 20:00:22.358470600 11 0 00:25:25.076046
7 2019-11-11 20:53:23.214166300 11 0 00:13:07.559577
8 2019-11-11 21:49:10.308323100 11 0 00:48:25.310757
9 2019-11-11 22:19:48.857907400 11 0 00:11:40.797698
10 2019-11-11 22:56:42.090168300 11 0 00:10:25.219881
11 NaT 11 0 00:13:13.742467
12 2019-11-14 14:52:40.396641500 14 3 01:06:08.250563
13 NaT 14 3 08:50:51.464340
LoginTime LogoutTime
0 2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400
1 2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400
2 2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400
3 2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400
4 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600
5 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600
6 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600
7 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600
8 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600
9 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600
10 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600
11 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600
12 2019-11-14 13:08:00.629026000 2019-11-14 23:43:31.860982200
13 2019-11-14 13:08:00.629026000 2019-11-14 23:43:31.860982200
Upvotes: 1