Reputation: 764
I generated a pandas pivot table that looks like this:
Encounters
Code 132 133 145
Record Number Start_date End_date Service_Date
2322 1/1/2017 1/3/2017 1/1/2017 0 1 1
1/2/2017 1 0 0
1/3/2017 0 1 1
I would like to merge and sum some of the pivot table columns based on the Code
Desired output:
Encounters
Code 132 133-145
Record Number Start_date End_date Service_Date
2322 1/1/2017 1/3/2017 1/1/2017 0 2
1/2/2017 1 0
1/3/2017 0 2
Upvotes: 0
Views: 2218
Reputation: 107687
Pivot tables create hierarchical columns (i.e., multiple levels). Hence, consider assigning a new, sum column using the tuple assignment for the different levels:
df[('Encounters', '133-145')] = df[('Encounters', '133')] + df[('Encounters', '145')]
del df[('Encounters', '133')]
del df[('Encounters', '145')]
df.sortlevel(0, axis=1, inplace=True)
To demonstrate with random data:
Data (seeded data with pivot)
import numpy as np
import pandas as pd
import datetime as dt
import time
LETTERS = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
epoch_time = int(time.time())
np.random.seed(555)
df = pd.DataFrame({'ID': [np.random.randint(15) for _ in range(50)],
'GROUP': ["".join(np.random.choice(LETTERS[0:3],1)) for _ in range(50)],
'NUM': np.random.uniform(50)/100,
'DATE': [dt.datetime.fromtimestamp(np.random.randint(low=1400270738,
high=epoch_time)) for _ in range(50)]})
df['YEAR'] = df['DATE'].dt.year
pvtdf = df.pivot_table(index = ['ID'], columns = ['YEAR', 'GROUP'], values = ['NUM']).fillna(0)
print(pvtdf)
# NUM
# YEAR 2014 2015 2016 2017
# GROUP A B C A B C A B C A B C
# ID
# 0 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258
# 1 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258
# 3 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.411258 0.000000 0.411258 0.000000
# 4 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000
# 5 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 6 0.000000 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 7 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 8 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 9 0.000000 0.000000 0.411258 0.411258 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000
# 10 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000
# 11 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000
# 12 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 13 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000 0.411258 0.000000 0.411258 0.000000
# 14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000
Process (all 2017 A, B, C columns added into D and then removed)
pvtdf[('NUM', 2017, 'D')] = pvtdf[('NUM', 2017, 'A')] + pvtdf[('NUM', 2017, 'B')] + pvtdf[('NUM', 2017, 'C')]
pvtdf = pvtdf.drop([('NUM', 2017, 'A'), ('NUM', 2017, 'B'), ('NUM', 2017, 'C')], axis=1)
pvtdf.sortlevel(0, axis=1, inplace=True)
print(pvtdf)
# NUM
# YEAR 2014 2015 2016 2017
# GROUP A B C A B C A B C D
# ID
# 0 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.822515
# 1 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.822515
# 3 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258 0.411258 0.411258 0.411258
# 4 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.411258
# 5 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 6 0.000000 0.411258 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000 0.000000
# 7 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 8 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
# 9 0.000000 0.000000 0.411258 0.411258 0.000000 0.411258 0.411258 0.000000 0.000000 0.000000
# 10 0.000000 0.000000 0.000000 0.411258 0.411258 0.000000 0.000000 0.411258 0.000000 0.000000
# 11 0.000000 0.000000 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000
# 12 0.411258 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
# 13 0.000000 0.411258 0.000000 0.000000 0.000000 0.000000 0.411258 0.000000 0.411258 0.411258
# 14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.411258
Upvotes: 1