Markus W
Markus W

Reputation: 1493

Get the last value of the day in a datetimeindexed dataframe filtering by other column values

I am having a dataframe like this:

data= {'Timestamp': ['2018-07-16 14:31:03','2018-07-13 11:59:50','2018-07-13 11:41:07','2018-07-13 10:50:24','2018-07-12 15:33:59','2018-07-12 11:32:52','2018-07-04 13:10:30','2018-07-04 10:37:15' ],
        'Maturity': [2019,2019, 2020,2020,2020,2020, 2021,2021],
        'Country': ['DE','DE','ES','ES','DE','DE', 'ES','ES'],
        'Price': [50.15, 51, 66, 68, 55, 54, 72.7, 73]         
        }
df = pd.DataFrame(data)
df.index = pd.DatetimeIndex(df.Timestamp)
df.drop(columns=['Timestamp'], inplace=True)
print(df)

resulting in this df:

    Timestamp   Country Maturity    Price
16.07.2018 14:31    DE  2019     50.15 
13.07.2018 11:59    DE  2019     51.00 
13.07.2018 11:41    ES  2020     66.00 
13.07.2018 10:50    ES  2020     68.00 
12.07.2018 15:33    DE  2020     55.00 
12.07.2018 11:32    DE  2020     54.00 
04.07.2018 13:10    ES  2021     72.70 
04.07.2018 10:37    ES  2021     73.00 

I would like to Resample or Group the dataframe in order to get the last "Price" of each day per "Country" and "Maturity".

The result should look like this:

Timestamp   Country Maturity Price
16.07.2018  DE      2019     50.15 
13.07.2018  DE      2019     51.00 
13.07.2018  ES      2020     66.00 
12.07.2018  DE      2020     55.00 
04.07.2018  ES      2021     72.70 

I have tried with df = df.resample('D', on='Timestamp')['Price'].agg(['last']) but unfortunately it is resulting in errors.

Can anyone help with this Problem?

Upvotes: 1

Views: 658

Answers (2)

Mohamed Thasin ah
Mohamed Thasin ah

Reputation: 11192

another way to solve this without sampling,

use drop_duplicates with the keys of date, country and maturity, by default it will keep first record.

data= {'Timestamp': ['2018-07-16 14:31:03','2018-07-13 11:59:50','2018-07-13 11:41:07','2018-07-13 10:50:24','2018-07-12 15:33:59','2018-07-12 11:32:52','2018-07-04 13:10:30','2018-07-04 10:37:15' ],
        'Maturity': [2019,2019, 2020,2020,2020,2020, 2021,2021],
        'Country': ['DE','DE','ES','ES','DE','DE', 'ES','ES'],
        'Price': [50.15, 51, 66, 68, 55, 54, 72.7, 73]         
        }
df = pd.DataFrame(data)
df.index = pd.DatetimeIndex(df.Timestamp)
df['date']=df.index.date
df= df.drop_duplicates(subset=['date','Country','Maturity'])
df.drop(['Timestamp','date'],axis=1, inplace=True)
print df

Output:

                    Country  Maturity  Price
Timestamp                                   
2018-07-16 14:31:03      DE      2019  50.15
2018-07-13 11:59:50      DE      2019  51.00
2018-07-13 11:41:07      ES      2020  66.00
2018-07-12 15:33:59      DE      2020  55.00
2018-07-04 13:10:30      ES      2021  72.70

Upvotes: 1

jezrael
jezrael

Reputation: 862641

I think need groupby with Grouper and GroupBy.last:

df = df.groupby(['Maturity','Country', pd.Grouper(freq='D')])['Price'].last().reset_index()

Or use DataFrameGroupBy.resample, but then necessary remove missing rows by dropna:

df = df.groupby(['Maturity','Country']).resample('D')['Price'].last().dropna().reset_index()
print (df)

   Maturity Country  Timestamp  Price
0      2019      DE 2018-07-13  51.00
1      2019      DE 2018-07-16  50.15
2      2020      DE 2018-07-12  55.00
3      2020      ES 2018-07-13  66.00
4      2021      ES 2018-07-04  72.70

Upvotes: 1

Related Questions