Reputation: 2810
I have a dataframe that contains entries for different day in and day out logs. but some logon and logout values occurs at same day. e.g. df
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| Id | LoginTime | LoginReading | LogoutTime | LogoutReading | Available | Calculated |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| 10036 | 2019-11-06 10:37:18.3743184 | 5054.68 | 2019-11-06 14:11:52.4833904 | 5057.94 | 500 | 530 |
| 10036 | 2019-11-11 12:46:20.7018683 | 4797.39 | 2019-11-11 15:55:48.9102874 | 4846.86 | 500 | 1091 |
| 10036 | 2019-11-11 15:56:42.7485310 | 4846.86 | 2019-11-11 17:20:40.2587713 | 4866.68 | 500 | 370 |
| 10036 | 2019-11-11 17:21:12.9416691 | 4866.68 | 2019-11-11 18:09:55.8326356 | 4892.33 | 500 | 311 |
| 10036 | 2019-11-14 08:08:00.6290260 | 5080.59 | 2019-11-14 18:43:31.8609822 | 5185.65 | 500 | 2528 |
| 10036 | 2019-11-15 06:00:48.4777280 | 5185.65 | 2019-11-15 06:31:55.0281168 | 5199.28 | 500 | 260 |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
I am trying to compress/merge rows with similar date to one row. My goal is to achieve something like this:
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| Id | LoginTime | LoginReading | LogoutTime | LogoutReading | Available | Calculated |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
| 10036 | 2019-11-06 10:37:18.3743184 | 5054.68 | 2019-11-06 14:11:52.4833904 | 5057.94 | 500 | 530 |
| 10036 | 2019-11-11 12:46:20.7018683 | 4797.39 | 2019-11-11 18:09:55.8326356 | 4892.33 | 500 | 1772 |
| 10036 | 2019-11-14 08:08:00.6290260 | 5080.59 | 2019-11-14 18:43:31.8609822 | 5185.65 | 500 | 2528 |
| 10036 | 2019-11-15 06:00:48.4777280 | 5185.65 | 2019-11-15 06:31:55.0281168 | 5199.28 | 500 | 260 |
+-------+-----------------------------+--------------+-----------------------------+---------------+-----------+------------+
I tried some other similar questions on SO but the logout time is not working.
Upvotes: 1
Views: 1255
Reputation: 9019
You can create a new column that tracks the day of the login and then you can use groupby()
and agg()
to compile the values you want:
df['Day'] = df['LoginTime'].str.extract(r'^(\d{4}-\d{2}-\d{2})')
df.groupby('Day').agg({'Id': 'first', 'LoginTime': 'first', 'LoginReading': 'first', 'LogoutTime': 'last',
'LogoutReading': 'last', 'Available': 'first', 'Calculated': 'sum'}).reset_index(drop=True)
Yields:
Id LoginTime LoginReading \
0 10036 2019-11-06 10:37:18.3743184 5054.68
1 10036 2019-11-11 12:46:20.7018683 4797.39
2 10036 2019-11-14 08:08:00.6290260 5080.59
3 10036 2019-11-15 06:00:48.4777280 5185.65
LogoutTime LogoutReading Available Calculated
0 2019-11-06 14:11:52.4833904 5057.94 500 530
1 2019-11-11 18:09:55.8326356 4892.33 500 1772
2 2019-11-14 18:43:31.8609822 5185.65 500 2528
3 2019-11-15 06:31:55.0281168 5199.28 500 260
Upvotes: 1