Pivot Pandas keeping only certain columns

Question

trying to arrange the below DF

pdf = pd.DataFrame({'num' : ['A1', "A1", 'A1', 'A2', "A3", 'A3', "A3", 'A3', 'A3'], 
                    'end_date' : ['2020-12-31', '2019-09-30', '2017-08-31', '2019-12-31', '2017-12-31', '2016-12-31', '2015-12-31', '2014-12-31', '2013-12-31'],
                    'amount' : [12000, 15000, 2000, 400000, 56500, 89000, 100000, 500, 8000],
                    'product' : ['car', 'bike', 'other', 'house', 'other', 'other', 'other', 'other', 'other'] })
pdf

num end_date    amount  product
A1  2020-12-31  12000   car
A1  2019-09-30  15000   bike
A1  2017-08-31  2000    other
A2  2019-12-31  400000  house
A3  2017-12-31  56500   other
A3  2016-12-31  89000   other
A3  2015-12-31  100000  other
A3  2014-12-31  500 other
A3  2013-12-31  8000    other

The aim is to keep only the last 3 values as below

num   N         N-1         N-2       product
A1    12000     NaN         NaN       car
A1    15000     NaN         NaN       bike
A1    2000      NaN         NaN       other
A2    400000    NaN         NaN       house
A3    56500     89000       100000    other

That's what I've tried so far without any success..

pdf.pivot_table(index = ['num', 'product'], columns = ['end_date', ], values = 'amount').reset_index()



num product 2013-12-31  2014-12-31  2015-12-31  2016-12-31  2017-08-31  2017-12-31  2019-09-30  2019-12-31  2020-12-31
A1  bike    NaN         NaN         NaN         NaN         NaN         NaN         15,000.00   NaN         NaN
A1  car     NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         12,000.00
A1  other   NaN         NaN         NaN         NaN         2,000.00    NaN         NaN         NaN         NaN
A2  house   NaN         NaN         NaN         NaN         NaN         NaN         NaN         400,000.00      NaN
A3  other   8,000.00    500.00      100,000.00  89,000.00   NaN         56,500.00   NaN         NaN         NaN

piterbarg · Accepted Answer

The idea is to groupby 'num' and 'product', pick three largest dates for each group, and unstack the second level of the index. The rest is making it look pretty:

pdf['end_date'] = pd.to_datetime(pdf['end_date'])

(pdf.groupby(['num','product'])
    .apply(lambda g:g.nlargest(3, 'end_date').reset_index())['amount']
    .unstack()
    .reset_index()
    .rename(columns = {0:'N', 1:'N-1', 2:'N-2'})
    )

output:

    num    product         N    N-1     N-2
--  -----  ---------  ------  -----  ------
 0  A1     bike        15000    nan     nan
 1  A1     car         12000    nan     nan
 2  A1     other        2000    nan     nan
 3  A2     house      400000    nan     nan
 4  A3     other       56500  89000  100000

Pivot Pandas keeping only certain columns

Answers (2)

Related Questions