M_S_N
M_S_N

Reputation: 2810

pandas merge rows based on same date

I have a dataframe that contains entries for different day in and day out logs. but some logon and logout values occurs at same day. e.g. df


+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
|  Id   |          LoginTime          | LoginReading |         LogoutTime          | LogoutReading | Available | Calculated |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| 10036 | 2019-11-06 10:37:18.3743184 |      5054.68 | 2019-11-06 14:11:52.4833904 |       5057.94 |       500 |        530 |
| 10036 | 2019-11-11 12:46:20.7018683 |      4797.39 | 2019-11-11 15:55:48.9102874 |       4846.86 |       500 |       1091 |
| 10036 | 2019-11-11 15:56:42.7485310 |      4846.86 | 2019-11-11 17:20:40.2587713 |       4866.68 |       500 |        370 |
| 10036 | 2019-11-11 17:21:12.9416691 |      4866.68 | 2019-11-11 18:09:55.8326356 |       4892.33 |       500 |        311 |
| 10036 | 2019-11-14 08:08:00.6290260 |      5080.59 | 2019-11-14 18:43:31.8609822 |       5185.65 |       500 |       2528 |
| 10036 | 2019-11-15 06:00:48.4777280 |      5185.65 | 2019-11-15 06:31:55.0281168 |       5199.28 |       500 |        260 |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+

I am trying to compress/merge rows with similar date to one row. My goal is to achieve something like this:

+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
|  Id   |          LoginTime          | LoginReading |         LogoutTime          | LogoutReading | Available | Calculated |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| 10036 | 2019-11-06 10:37:18.3743184 |      5054.68 | 2019-11-06 14:11:52.4833904 |       5057.94 |       500 |        530 |
| 10036 | 2019-11-11 12:46:20.7018683 |      4797.39 | 2019-11-11 18:09:55.8326356 |       4892.33 |       500 |       1772 |
| 10036 | 2019-11-14 08:08:00.6290260 |      5080.59 | 2019-11-14 18:43:31.8609822 |       5185.65 |       500 |       2528 |
| 10036 | 2019-11-15 06:00:48.4777280 |      5185.65 | 2019-11-15 06:31:55.0281168 |       5199.28 |       500 |        260 |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+

I tried some other similar questions on SO but the logout time is not working.

Upvotes: 1

Views: 1255

Answers (1)

rahlf23
rahlf23

Reputation: 9019

You can create a new column that tracks the day of the login and then you can use groupby() and agg() to compile the values you want:

df['Day'] = df['LoginTime'].str.extract(r'^(\d{4}-\d{2}-\d{2})')

df.groupby('Day').agg({'Id': 'first', 'LoginTime': 'first', 'LoginReading': 'first', 'LogoutTime': 'last',
'LogoutReading': 'last', 'Available': 'first', 'Calculated': 'sum'}).reset_index(drop=True)

Yields:

      Id                    LoginTime  LoginReading  \
0  10036  2019-11-06 10:37:18.3743184       5054.68   
1  10036  2019-11-11 12:46:20.7018683       4797.39   
2  10036  2019-11-14 08:08:00.6290260       5080.59   
3  10036  2019-11-15 06:00:48.4777280       5185.65   

                    LogoutTime  LogoutReading  Available  Calculated  
0  2019-11-06 14:11:52.4833904        5057.94        500         530  
1  2019-11-11 18:09:55.8326356        4892.33        500        1772  
2  2019-11-14 18:43:31.8609822        5185.65        500        2528  
3  2019-11-15 06:31:55.0281168        5199.28        500         260  

Upvotes: 1

Related Questions