teteh May
teteh May

Reputation: 455

Difference of value between two different times at the same date

I have a dataframe df as below:

Datetime                Value
2020-03-01 08:00:00      10
2020-03-01 10:00:00      12
2020-03-01 12:00:00      15
2020-03-02 09:00:00       1
2020-03-02 10:00:00       3
2020-03-02 13:00:00       8
2020-03-03 10:00:00      20
2020-03-03 12:00:00      25
2020-03-03 14:00:00      15

I would like to calculate the difference between the value on the first time of each date and the last time of each date (ignoring the value of other time within a date), so the result will be:

Datetime      Value_Difference
2020-03-01          5
2020-03-02          7
2020-03-03         -5

I have been doing this using a for loop, but it is slow (as expected) when I have larger data. Any help will be appreciated.

Upvotes: 1

Views: 788

Answers (2)

EddyG
EddyG

Reputation: 685

Shaido's method works, but might be slow due to the groupby on very large sets

Another possible way is to take a difference from dates converted to int and only grab the values necessary without a loop.

idx = df.index

loc = np.diff(idx.strftime('%Y%m%d').astype(int).values).nonzero()[0]

loc1 = np.append(0,loc)

loc2 = np.append(loc,len(idx)-1)

res = df.values[loc2]-df.values[loc1]

df = pd.DataFrame(index=idx.date[loc1],values=res,columns=['values'])

Upvotes: 0

Shaido
Shaido

Reputation: 28322

One solution would be to make sure the data is sorted by time, group by the data and then take the first and last value in each day. This works since pandas will preserve the order during groupby, see e.g. here.

df = df.sort_values(by='Datetime').groupby(df['Datetime'].dt.date).agg({'Value': ['first', 'last']})
df['Value_Difference'] = df['Value']['last'] - df['Value']['first']
df = df.drop('Value', axis=1).reset_index()

Result:

Datetime      Value_Difference
2020-03-01          5
2020-03-02          7
2020-03-03         -5

Upvotes: 2

Related Questions