Reputation: 311
I have a DataFrame like:
SK_ID_CURR CREDIT_ACTIVE CREDIT_DAY_OVERDUE
436084 Sold 0
436084 Active 951
436084 Sold 0
436084 Active 0
436084 Bad debt 0
436084 Active 936
436084 Active 951
I'd like to make new columns for each CREDIT_ACTIVE category, with the corresponding sum of values of CREDIT_DAY_OVERDUE.
The result should look like:
SK_ID_CURR CREDIT_ACTIVE_OD CREDIT_BAD_DEBT_OD CREDIT_ACTIVE_SOLD_OD
436084 2838 0 0
Upvotes: 3
Views: 155
Reputation: 164693
Using pd.pivot_table
:
res = pd.pivot_table(df, index='SK_ID_CURR', columns='CREDIT_ACTIVE',
values='CREDIT_DAY_OVERDUE', aggfunc='sum')
print(res)
CREDIT_ACTIVE Active BadDebt Sold
SK_ID_CURR
436084 2838 0 0
Upvotes: 0
Reputation: 862741
Use groupby
and aggregate sum
, last reshape by unstack
:
df = (df.groupby(['SK_ID_CURR','CREDIT_ACTIVE'])['CREDIT_DAY_OVERDUE']
.sum()
.unstack(fill_value=0))
Or use pivot_table
:
df = df.pivot_table(index='SK_ID_CURR',
columns='CREDIT_ACTIVE',
values='CREDIT_DAY_OVERDUE',
aggfunc='sum',
fill_value=0)
Then change columns names:
df.columns = ['CREDIT_{}_OD'.format(x.upper()) for x in df.columns]
And last create column from index:
df = df.reset_index()
print (df)
SK_ID_CURR CREDIT_ACTIVE_OD CREDIT_BAD DEBT_OD CREDIT_SOLD_OD
0 436084 2838 0 0
Upvotes: 4