Reputation: 93
So, this question is continuation of my previous one :
inserting row with values at certain index taking too long
My tables are structured like this:
+-------------------------------------------------------+
| CarID CarNumber GPS DateTime Speed |
+-------------------------------------------------------+
| WFV303 303 104:58 04.02.2019 10:10:51 21 |
| WFV303 303 104:58 04.02.2019 10:10:54 23 |
| WFV303 303 104:58 04.02.2019 10:10:59 23 |
| WFV303 303 104:58 04.02.2019 10:11:01 24 |
| FBV404 404 105:59 04.02.2019 12:10:20 19 |
| FBV404 404 105:59 04.02.2019 12:10:25 19 |
+-------------------------------------------------------+
CSV file which I use structured in a way that you have all records for one certain car going one after another until it reaches next car and so on.
For the analysis purposes I calculate time difference in seconds to get my time delta and to account for different CarNumber I used approach suggested by @mcsoini :
+------------------------------------------------------------+
| CarID CarNumber GPS DateTime Speed dt |
+------------------------------------------------------------+
| WFV303 303 104:58 04.02.2019 10:10:51 21 NaT |
| WFV303 303 104:58 04.02.2019 10:10:54 23 3 |
| WFV303 303 104:58 04.02.2019 10:10:59 23 5 |
| WFV303 303 104:58 04.02.2019 10:11:01 24 2 |
| 0 0 0 0 0 0 |
| FBV303 404 105:59 04.02.2019 12:10:20 19 NaT |
| FBV303 404 105:59 04.02.2019 12:10:25 19 5 |
+------------------------------------------------------------+
However, after grouping by CarNumber I get border cases like this (when records from one day ends and another day starts):
+--------------------------------------------------------------+
| CarID CarNumber GPS DateTime Speed dt |
+--------------------------------------------------------------+
| WFV303 303 104:58 04.02.2019 16:41:05 21 25 |
| WFV303 303 104:58 05.02.2019 08:18:16 23 56231 |
+--------------------------------------------------------------+
Which means that it takes difference in time between 5th and 4th February and I want it to be zero instead, is there any efficient way of achieve this?
Thank you!
Upvotes: 0
Views: 71
Reputation: 6642
Example data:
df = pd.DataFrame({'CarNumber': [303] * 6 + [404] * 2 + [405] * 3,
'othercol': range(11),
'DateTime': pd.date_range('02.04.2019 16:00:00', '02.05.2019 12:00:00', freq='2H')})
df
CarNumber othercol DateTime
0 303 0 2019-02-04 16:00:00
1 303 1 2019-02-04 18:00:00
2 303 2 2019-02-04 20:00:00
3 303 3 2019-02-04 22:00:00
4 303 4 2019-02-05 00:00:00
5 303 5 2019-02-05 02:00:00
6 404 6 2019-02-05 04:00:00
7 404 7 2019-02-05 06:00:00
8 405 8 2019-02-05 08:00:00
9 405 9 2019-02-05 10:00:00
10 405 10 2019-02-05 12:00:00
Get time difference by car, day:
df['dt'] = df.groupby(['CarNumber', df.DateTime.dt.date
]).DateTime.diff()
Result:
CarNumber othercol DateTime date dt
0 303 0 2019-02-04 16:00:00 2019-02-04 NaT
1 303 1 2019-02-04 18:00:00 2019-02-04 02:00:00
2 303 2 2019-02-04 20:00:00 2019-02-04 02:00:00
3 303 3 2019-02-04 22:00:00 2019-02-04 02:00:00
4 303 4 2019-02-05 00:00:00 2019-02-05 NaT
5 303 5 2019-02-05 02:00:00 2019-02-05 02:00:00
6 404 6 2019-02-05 04:00:00 2019-02-05 NaT
7 404 7 2019-02-05 06:00:00 2019-02-05 02:00:00
8 405 8 2019-02-05 08:00:00 2019-02-05 NaT
9 405 9 2019-02-05 10:00:00 2019-02-05 02:00:00
10 405 10 2019-02-05 12:00:00 2019-02-05 02:00:00
Note the difference zero for car number 303 when switching from day 4 to 5.
Upvotes: 1
Reputation: 30920
as I proposed in the comment:How to account for the different dates when calculating change it doesn't need apply
and neither create an additional column:
df['DateTime']=pd.to_datetime(df['DateTIme'])
df['dt']=df.groupby(['CarNumber',df.DateTime.dt.date],sort=False)['DateTime'].diff()
If you want fill with 0 use:
df['dt']=( df.groupby(['CarNumber',df.DateTime.dt.date],sort=False)['DateTime']
.diff()
.fillna(0) )
Upvotes: 1