Adren
Adren

Reputation: 93

How to account for the different dates when calculating change

So, this question is continuation of my previous one :

inserting row with values at certain index taking too long

My tables are structured like this:

+-------------------------------------------------------+
| CarID  CarNumber   GPS     DateTime             Speed |
+-------------------------------------------------------+
| WFV303   303      104:58  04.02.2019 10:10:51    21   |
| WFV303   303      104:58  04.02.2019 10:10:54    23   |
| WFV303   303      104:58  04.02.2019 10:10:59    23   |
| WFV303   303      104:58  04.02.2019 10:11:01    24   |
| FBV404   404      105:59  04.02.2019 12:10:20    19   |
| FBV404   404      105:59  04.02.2019 12:10:25    19   |
+-------------------------------------------------------+

CSV file which I use structured in a way that you have all records for one certain car going one after another until it reaches next car and so on.

For the analysis purposes I calculate time difference in seconds to get my time delta and to account for different CarNumber I used approach suggested by @mcsoini :

+------------------------------------------------------------+
| CarID  CarNumber   GPS     DateTime             Speed  dt  |
+------------------------------------------------------------+
| WFV303   303      104:58  04.02.2019 10:10:51    21   NaT  |
| WFV303   303      104:58  04.02.2019 10:10:54    23    3   |
| WFV303   303      104:58  04.02.2019 10:10:59    23    5   |
| WFV303   303      104:58  04.02.2019 10:11:01    24    2   |
| 0        0        0       0                      0     0   |
| FBV303   404      105:59  04.02.2019 12:10:20    19   NaT  |
| FBV303   404      105:59  04.02.2019 12:10:25    19    5   | 
+------------------------------------------------------------+

However, after grouping by CarNumber I get border cases like this (when records from one day ends and another day starts):

+--------------------------------------------------------------+
|  CarID  CarNumber   GPS     DateTime             Speed  dt   |
+--------------------------------------------------------------+
| WFV303   303      104:58  04.02.2019 16:41:05    21    25    |
| WFV303   303      104:58  05.02.2019 08:18:16    23    56231 |
+--------------------------------------------------------------+

Which means that it takes difference in time between 5th and 4th February and I want it to be zero instead, is there any efficient way of achieve this?

Thank you!

Upvotes: 0

Views: 71

Answers (2)

mcsoini
mcsoini

Reputation: 6642

Example data:

df = pd.DataFrame({'CarNumber': [303] * 6 + [404] * 2 + [405] * 3,
                   'othercol': range(11),
                   'DateTime': pd.date_range('02.04.2019 16:00:00', '02.05.2019 12:00:00', freq='2H')})

df
    CarNumber  othercol            DateTime
0         303         0 2019-02-04 16:00:00
1         303         1 2019-02-04 18:00:00
2         303         2 2019-02-04 20:00:00
3         303         3 2019-02-04 22:00:00
4         303         4 2019-02-05 00:00:00
5         303         5 2019-02-05 02:00:00
6         404         6 2019-02-05 04:00:00
7         404         7 2019-02-05 06:00:00
8         405         8 2019-02-05 08:00:00
9         405         9 2019-02-05 10:00:00
10        405        10 2019-02-05 12:00:00

Get time difference by car, day:

df['dt'] = df.groupby(['CarNumber', df.DateTime.dt.date
]).DateTime.diff()

Result:

    CarNumber  othercol            DateTime        date       dt
0         303         0 2019-02-04 16:00:00  2019-02-04      NaT
1         303         1 2019-02-04 18:00:00  2019-02-04 02:00:00
2         303         2 2019-02-04 20:00:00  2019-02-04 02:00:00
3         303         3 2019-02-04 22:00:00  2019-02-04 02:00:00
4         303         4 2019-02-05 00:00:00  2019-02-05      NaT
5         303         5 2019-02-05 02:00:00  2019-02-05 02:00:00
6         404         6 2019-02-05 04:00:00  2019-02-05      NaT
7         404         7 2019-02-05 06:00:00  2019-02-05 02:00:00
8         405         8 2019-02-05 08:00:00  2019-02-05      NaT
9         405         9 2019-02-05 10:00:00  2019-02-05 02:00:00
10        405        10 2019-02-05 12:00:00  2019-02-05 02:00:00

Note the difference zero for car number 303 when switching from day 4 to 5.

Upvotes: 1

ansev
ansev

Reputation: 30920

as I proposed in the comment:How to account for the different dates when calculating change it doesn't need apply and neither create an additional column:

df['DateTime']=pd.to_datetime(df['DateTIme'])
df['dt']=df.groupby(['CarNumber',df.DateTime.dt.date],sort=False)['DateTime'].diff()

If you want fill with 0 use:

df['dt']=( df.groupby(['CarNumber',df.DateTime.dt.date],sort=False)['DateTime']
             .diff()
             .fillna(0) )

Upvotes: 1

Related Questions