Reputation: 954
I have 5 minutes-interval meteorological df for many days. Some rows and days are missing. Index is datetime format.
DateTime Data
2016-01-01 07:00:00 1
2016-01-01 10:30:00 2
2016-01-01 16:55:00 3
2016-03-25 09:25:00 4
2016-03-25 11:30:00 5
2016-03-25 13:35:00 6
2016-03-25 17:40:00 7
2017-11-09 12:00:00 8
2017-11-09 13:05:00 9
2017-11-09 16:10:00 10
2017-11-09 18:15:00 11
2017-11-09 19:20:00 12
2017-11-09 20:25:00 13
I want to make a new_df of daily data with column Data_diff. Column Data_diff should contain the result of subtraction of the last data to the first data from each day.
The expected result is:
DateTime Data_diff
2016-01-01 2
2016-03-25 3
2017-11-09 5
I have no idea what to do. Crossed in mind to use
new_df = df.diff()
but, it was not true for this case.
Edit: I also try the following
new_df = df.resample('D')['Data'].agg(['first','last'])
new_df['Data_diff'] = new_df['first'] - new_df['last']
but the result is not correct.
Upvotes: 3
Views: 920
Reputation: 4137
Use pandas.groupby
together with dt.day
and the apply the function you look for.
s = df.groupby(df['DateTime'].dt.day)['Data'].apply(lambda x: x.values[-1]-x.values[0])
print(s)
# Data
# DateTime
# 1 2
# 9 5
# 25 3
Upvotes: 1
Reputation: 862511
Function resample
add all missing days filled by NaN
s.
You can remove only these days by DataFrame.dropna
:
new_df = df.resample('D')['Data'].agg(['first','last']).dropna(how='all')
new_df['Data_diff'] = new_df['last'] - new_df['first']
print (new_df)
first last Data_diff
DateTime
2016-01-01 1.0 3.0 2.0
2016-03-25 4.0 7.0 3.0
2017-11-09 8.0 13.0 5.0
Upvotes: 1