Reputation: 1493
I am having a dataframe like this:
data= {'Timestamp': ['2018-07-16 14:31:03','2018-07-13 11:59:50','2018-07-13 11:41:07','2018-07-13 10:50:24','2018-07-12 15:33:59','2018-07-12 11:32:52','2018-07-04 13:10:30','2018-07-04 10:37:15' ],
'Maturity': [2019,2019, 2020,2020,2020,2020, 2021,2021],
'Country': ['DE','DE','ES','ES','DE','DE', 'ES','ES'],
'Price': [50.15, 51, 66, 68, 55, 54, 72.7, 73]
}
df = pd.DataFrame(data)
df.index = pd.DatetimeIndex(df.Timestamp)
df.drop(columns=['Timestamp'], inplace=True)
print(df)
resulting in this df:
Timestamp Country Maturity Price
16.07.2018 14:31 DE 2019 50.15
13.07.2018 11:59 DE 2019 51.00
13.07.2018 11:41 ES 2020 66.00
13.07.2018 10:50 ES 2020 68.00
12.07.2018 15:33 DE 2020 55.00
12.07.2018 11:32 DE 2020 54.00
04.07.2018 13:10 ES 2021 72.70
04.07.2018 10:37 ES 2021 73.00
I would like to Resample or Group the dataframe in order to get the last "Price" of each day per "Country" and "Maturity".
The result should look like this:
Timestamp Country Maturity Price
16.07.2018 DE 2019 50.15
13.07.2018 DE 2019 51.00
13.07.2018 ES 2020 66.00
12.07.2018 DE 2020 55.00
04.07.2018 ES 2021 72.70
I have tried with df = df.resample('D', on='Timestamp')['Price'].agg(['last'])
but unfortunately it is resulting in errors.
Can anyone help with this Problem?
Upvotes: 1
Views: 658
Reputation: 11192
another way to solve this without sampling,
use drop_duplicates
with the keys of date, country and maturity,
by default it will keep first record.
data= {'Timestamp': ['2018-07-16 14:31:03','2018-07-13 11:59:50','2018-07-13 11:41:07','2018-07-13 10:50:24','2018-07-12 15:33:59','2018-07-12 11:32:52','2018-07-04 13:10:30','2018-07-04 10:37:15' ],
'Maturity': [2019,2019, 2020,2020,2020,2020, 2021,2021],
'Country': ['DE','DE','ES','ES','DE','DE', 'ES','ES'],
'Price': [50.15, 51, 66, 68, 55, 54, 72.7, 73]
}
df = pd.DataFrame(data)
df.index = pd.DatetimeIndex(df.Timestamp)
df['date']=df.index.date
df= df.drop_duplicates(subset=['date','Country','Maturity'])
df.drop(['Timestamp','date'],axis=1, inplace=True)
print df
Output:
Country Maturity Price
Timestamp
2018-07-16 14:31:03 DE 2019 50.15
2018-07-13 11:59:50 DE 2019 51.00
2018-07-13 11:41:07 ES 2020 66.00
2018-07-12 15:33:59 DE 2020 55.00
2018-07-04 13:10:30 ES 2021 72.70
Upvotes: 1
Reputation: 862641
I think need groupby
with Grouper
and GroupBy.last
:
df = df.groupby(['Maturity','Country', pd.Grouper(freq='D')])['Price'].last().reset_index()
Or use DataFrameGroupBy.resample
, but then necessary remove missing rows by dropna
:
df = df.groupby(['Maturity','Country']).resample('D')['Price'].last().dropna().reset_index()
print (df)
Maturity Country Timestamp Price
0 2019 DE 2018-07-13 51.00
1 2019 DE 2018-07-16 50.15
2 2020 DE 2018-07-12 55.00
3 2020 ES 2018-07-13 66.00
4 2021 ES 2018-07-04 72.70
Upvotes: 1