Reputation: 605
trying to arrange the below DF
pdf = pd.DataFrame({'num' : ['A1', "A1", 'A1', 'A2', "A3", 'A3', "A3", 'A3', 'A3'],
'end_date' : ['2020-12-31', '2019-09-30', '2017-08-31', '2019-12-31', '2017-12-31', '2016-12-31', '2015-12-31', '2014-12-31', '2013-12-31'],
'amount' : [12000, 15000, 2000, 400000, 56500, 89000, 100000, 500, 8000],
'product' : ['car', 'bike', 'other', 'house', 'other', 'other', 'other', 'other', 'other'] })
pdf
num end_date amount product
A1 2020-12-31 12000 car
A1 2019-09-30 15000 bike
A1 2017-08-31 2000 other
A2 2019-12-31 400000 house
A3 2017-12-31 56500 other
A3 2016-12-31 89000 other
A3 2015-12-31 100000 other
A3 2014-12-31 500 other
A3 2013-12-31 8000 other
The aim is to keep only the last 3 values as below
num N N-1 N-2 product
A1 12000 NaN NaN car
A1 15000 NaN NaN bike
A1 2000 NaN NaN other
A2 400000 NaN NaN house
A3 56500 89000 100000 other
That's what I've tried so far without any success..
pdf.pivot_table(index = ['num', 'product'], columns = ['end_date', ], values = 'amount').reset_index()
num product 2013-12-31 2014-12-31 2015-12-31 2016-12-31 2017-08-31 2017-12-31 2019-09-30 2019-12-31 2020-12-31
A1 bike NaN NaN NaN NaN NaN NaN 15,000.00 NaN NaN
A1 car NaN NaN NaN NaN NaN NaN NaN NaN 12,000.00
A1 other NaN NaN NaN NaN 2,000.00 NaN NaN NaN NaN
A2 house NaN NaN NaN NaN NaN NaN NaN 400,000.00 NaN
A3 other 8,000.00 500.00 100,000.00 89,000.00 NaN 56,500.00 NaN NaN NaN
Upvotes: 2
Views: 56
Reputation: 8219
The idea is to groupby
'num' and 'product', pick three largest dates for each group, and unstack
the second level of the index. The rest is making it look pretty:
pdf['end_date'] = pd.to_datetime(pdf['end_date'])
(pdf.groupby(['num','product'])
.apply(lambda g:g.nlargest(3, 'end_date').reset_index())['amount']
.unstack()
.reset_index()
.rename(columns = {0:'N', 1:'N-1', 2:'N-2'})
)
output:
num product N N-1 N-2
-- ----- --------- ------ ----- ------
0 A1 bike 15000 nan nan
1 A1 car 12000 nan nan
2 A1 other 2000 nan nan
3 A2 house 400000 nan nan
4 A3 other 56500 89000 100000
Upvotes: 2
Reputation: 9
I am not sure if I understood your question correctly. But do you want to pivot up after the last 3 date? In your actual data table you have 9 distinct dates in end_date column, so it is expected that when you pivot up on end_date column you will have 9 new columns generated as in your last screenshot.
If you want to pivot based on the last 3 dates only, you can delete other rows and then do a pivot up.
Upvotes: -1