k.ko3n
k.ko3n

Reputation: 954

For each day, subtract last time row from the first time row with Python

I have 5 minutes-interval meteorological df for many days. Some rows and days are missing. Index is datetime format.

DateTime               Data
2016-01-01 07:00:00     1
2016-01-01 10:30:00     2
2016-01-01 16:55:00     3

2016-03-25 09:25:00     4
2016-03-25 11:30:00     5
2016-03-25 13:35:00     6
2016-03-25 17:40:00     7 

2017-11-09 12:00:00     8
2017-11-09 13:05:00     9
2017-11-09 16:10:00    10
2017-11-09 18:15:00    11
2017-11-09 19:20:00    12
2017-11-09 20:25:00    13

I want to make a new_df of daily data with column Data_diff. Column Data_diff should contain the result of subtraction of the last data to the first data from each day.

The expected result is:

DateTime      Data_diff
2016-01-01    2
2016-03-25    3
2017-11-09    5

I have no idea what to do. Crossed in mind to use

new_df = df.diff()

but, it was not true for this case.

Edit: I also try the following

new_df = df.resample('D')['Data'].agg(['first','last'])
new_df['Data_diff'] = new_df['first'] - new_df['last']

but the result is not correct.

Upvotes: 3

Views: 920

Answers (2)

b-fg
b-fg

Reputation: 4137

Use pandas.groupby together with dt.day and the apply the function you look for.

s = df.groupby(df['DateTime'].dt.day)['Data'].apply(lambda x: x.values[-1]-x.values[0])
print(s)
#           Data
# DateTime
# 1            2
# 9            5
# 25           3

Upvotes: 1

jezrael
jezrael

Reputation: 862511

Function resample add all missing days filled by NaNs.

You can remove only these days by DataFrame.dropna:

new_df = df.resample('D')['Data'].agg(['first','last']).dropna(how='all')
new_df['Data_diff'] =  new_df['last'] - new_df['first']
print (new_df)
            first  last  Data_diff
DateTime                          
2016-01-01    1.0   3.0        2.0
2016-03-25    4.0   7.0        3.0
2017-11-09    8.0  13.0        5.0

Upvotes: 1

Related Questions