M_S_N
M_S_N

Reputation: 2810

pandas: add/append rows using groupby w.r.t date

i have a following DataFrame which looks something like this

+------+------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
|  Id  | B_Id |             Start             |              End              | Day | Week_day |        f        |           LoginTime           |          LogoutTime           |
+------+------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
| 4960 | 2213 | 2019-11-06 16:21:25.722382200 | 2019-11-06 16:50:32.219562100 |   6 |        2 | 00:44:07.348063 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4967 | NaN  | 2019-11-06 17:05:45.411147300 | 2019-11-06 18:09:32.823322100 |   6 |        2 | 00:15:13.191585 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4973 | 2234 | 2019-11-06 18:58:11.333021900 | 2019-11-06 18:58:44.839433600 |   6 |        2 | 00:48:38.509699 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 5131 | NaN  | 2019-11-11 17:50:06.068008900 | 2019-11-11 18:22:28.992619600 |  11 |        0 | 00:03:45.366140 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5135 | NaN  | 2019-11-11 18:55:06.819281700 | 2019-11-11 19:19:47.855423700 |  11 |        0 | 00:32:37.826662 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5138 | NaN  | 2019-11-11 19:45:12.931469700 | 2019-11-11 20:00:22.358470600 |  11 |        0 | 00:25:25.076046 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5139 | NaN  | 2019-11-11 20:13:29.918047600 | 2019-11-11 20:53:23.214166300 |  11 |        0 | 00:13:07.559577 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5142 | NaN  | 2019-11-11 21:41:48.524924200 | 2019-11-11 21:49:10.308323100 |  11 |        0 | 00:48:25.310757 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5144 | NaN  | 2019-11-11 22:00:51.106021500 | 2019-11-11 22:19:48.857907400 |  11 |        0 | 00:11:40.797698 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5145 | NaN  | 2019-11-11 22:30:14.077789100 | 2019-11-11 22:56:42.090168300 |  11 |        0 | 00:10:25.219881 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5225 | NaN  | 2019-11-14 14:14:08.879589200 | 2019-11-14 14:52:40.396641500 |  14 |        3 | 01:06:08.250563 | 2019-11-14 13:08:00.629026000 | 2019-11-14 23:43:31.860982200 |
+------+------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+

I want to append row to the end of each day containing difference of logOut time and last End date eg: 2019-11-06 19:11:52.483390400 - 2019-11-06 18:58:44.839433600

so,the final output looks something like this:


+------+-----------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
|  Id  | B_Id      |             Start             |              End              | Day | Week_day |        f        |           LoginTime           |          LogoutTime           |
+------+-----------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+
| 4960 | 2213      | 2019-11-06 16:21:25.722382200 | 2019-11-06 16:50:32.219562100 |   6 |        2 | 00:44:07.348063 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4967 | NaN       | 2019-11-06 17:05:45.411147300 | 2019-11-06 18:09:32.823322100 |   6 |        2 | 00:15:13.191585 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 4973 | 2234      | 2019-11-06 18:58:11.333021900 | 2019-11-06 18:58:44.839433600 |   6 |        2 | 00:48:38.509699 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| nan  | NaN       | nat                           | nat                           |   6 |        2 | 00:13:08.509699 | 2019-11-06 15:37:18.374318400 | 2019-11-06 19:11:52.483390400 |
| 5131 | NaN       | 2019-11-11 17:50:06.068008900 | 2019-11-11 18:22:28.992619600 |  11 |        0 | 00:03:45.366140 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5135 | NaN       | 2019-11-11 18:55:06.819281700 | 2019-11-11 19:19:47.855423700 |  11 |        0 | 00:32:37.826662 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5138 | NaN       | 2019-11-11 19:45:12.931469700 | 2019-11-11 20:00:22.358470600 |  11 |        0 | 00:25:25.076046 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5139 | NaN       | 2019-11-11 20:13:29.918047600 | 2019-11-11 20:53:23.214166300 |  11 |        0 | 00:13:07.559577 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5142 | NaN       | 2019-11-11 21:41:48.524924200 | 2019-11-11 21:49:10.308323100 |  11 |        0 | 00:48:25.310757 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5144 | NaN       | 2019-11-11 22:00:51.106021500 | 2019-11-11 22:19:48.857907400 |  11 |        0 | 00:11:40.797698 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| 5145 | NaN       | 2019-11-11 22:30:14.077789100 | 2019-11-11 22:56:42.090168300 |  11 |        0 | 00:10:25.219881 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
| nan  | NaN       | nat                           | nat                           |  11 |        0 | 00:13:13.219881 | 2019-11-11 17:46:20.701868300 | 2019-11-11 23:09:55.832635600 |
+------+-----------+-------------------------------+-------------------------------+-----+----------+-----------------+-------------------------------+-------------------------------+

I am only interested in f column rest can have NaN and nulls etc or may be duplicates. Any guidance will be appreciated.

Upvotes: 2

Views: 66

Answers (1)

jezrael
jezrael

Reputation: 862441

Use GroupBy.last by dates of some column, e.g. LoginTime, then set column f by subtract and another columns to missing values, DataFrame.append to original and sorting by some columns like LoginTime and Start:

#if necessary convert to timedeltas
df['f'] = pd.to_timedelta(df['f'])
print (df.dtypes)
Id                      int64
B_Id                  float64
Start          datetime64[ns]
End            datetime64[ns]
Day                     int64
Week_day                int64
f             timedelta64[ns]
LoginTime      datetime64[ns]
LogoutTime     datetime64[ns]
dtype: object


df1 = df.groupby(df['LoginTime'].dt.date).last()
df1 = (df1.assign(f = df1['LogoutTime'].sub(df1['End']),
                  Id = np.nan,
                  B_Id = np.nan, 
                  Start = np.nan,
                  End = np.nan)
          .append(df)
          .sort_values(['LoginTime', 'Start'])
          .reset_index(drop=True))

print (df1)
        Id    B_Id                         Start  \
0   4960.0  2213.0 2019-11-06 16:21:25.722382200   
1   4967.0     NaN 2019-11-06 17:05:45.411147300   
2   4973.0  2234.0 2019-11-06 18:58:11.333021900   
3      NaN     NaN                           NaT   
4   5131.0     NaN 2019-11-11 17:50:06.068008900   
5   5135.0     NaN 2019-11-11 18:55:06.819281700   
6   5138.0     NaN 2019-11-11 19:45:12.931469700   
7   5139.0     NaN 2019-11-11 20:13:29.918047600   
8   5142.0     NaN 2019-11-11 21:41:48.524924200   
9   5144.0     NaN 2019-11-11 22:00:51.106021500   
10  5145.0     NaN 2019-11-11 22:30:14.077789100   
11     NaN     NaN                           NaT   
12  5225.0     NaN 2019-11-14 14:14:08.879589200   
13     NaN     NaN                           NaT   

                             End  Day  Week_day               f  \
0  2019-11-06 16:50:32.219562100    6         2 00:44:07.348063   
1  2019-11-06 18:09:32.823322100    6         2 00:15:13.191585   
2  2019-11-06 18:58:44.839433600    6         2 00:48:38.509699   
3                            NaT    6         2 00:13:07.643956   
4  2019-11-11 18:22:28.992619600   11         0 00:03:45.366140   
5  2019-11-11 19:19:47.855423700   11         0 00:32:37.826662   
6  2019-11-11 20:00:22.358470600   11         0 00:25:25.076046   
7  2019-11-11 20:53:23.214166300   11         0 00:13:07.559577   
8  2019-11-11 21:49:10.308323100   11         0 00:48:25.310757   
9  2019-11-11 22:19:48.857907400   11         0 00:11:40.797698   
10 2019-11-11 22:56:42.090168300   11         0 00:10:25.219881   
11                           NaT   11         0 00:13:13.742467   
12 2019-11-14 14:52:40.396641500   14         3 01:06:08.250563   
13                           NaT   14         3 08:50:51.464340   

                       LoginTime                    LogoutTime  
0  2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400  
1  2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400  
2  2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400  
3  2019-11-06 15:37:18.374318400 2019-11-06 19:11:52.483390400  
4  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
5  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
6  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
7  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
8  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
9  2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
10 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
11 2019-11-11 17:46:20.701868300 2019-11-11 23:09:55.832635600  
12 2019-11-14 13:08:00.629026000 2019-11-14 23:43:31.860982200  
13 2019-11-14 13:08:00.629026000 2019-11-14 23:43:31.860982200  

Upvotes: 1

Related Questions