Reputation: 18406
I have following DataFrame:
ETHNIC RACE AGE TRT01A
0 NOT HISPANIC OR LATINO WHITE 31.824778 Treatment B
1 NOT HISPANIC OR LATINO WHITE 31.381246 Placebo
2 HISPANIC OR LATINO WHITE 45.522245 Treatment A
3 HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN 42.910335 Treatment B
4 NOT HISPANIC OR LATINO WHITE 31.381246 Placebo
5 NOT HISPANIC OR LATINO WHITE 38.045175 Treatment B
6 HISPANIC OR LATINO WHITE 39.337440 Placebo
7 NOT HISPANIC OR LATINO WHITE 47.121150 Placebo
8 NOT HISPANIC OR LATINO WHITE 38.203970 Treatment A
9 NOT HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN 22.926762 Placebo
10 HISPANIC OR LATINO WHITE 45.226557 Treatment B
11 HISPANIC OR LATINO WHITE 32.112252 Placebo
Just copy above dataframe to clipboard and run df=pd.read_clipboard('\s\s+')
to get the dataframe into a variable.
out = (df.groupby(['TRT01A','ETHNIC', 'RACE'])['AGE']
.agg(mean=np.mean,
n='count',
deviation=np.std,
Q1=lambda x: np.percentile(x, 0.25)
)
.T.unstack().unstack(0)
)
I performed some aggregates in the above dataframe, and transposed, and successively unstacked them to get the following result:
TRT01A Placebo Treatment A Treatment B
ETHNIC RACE
HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN mean NaN NaN 42.910335
n NaN NaN 1.000000
deviation NaN NaN NaN
Q1 NaN NaN 42.910335
WHITE mean 35.724846 45.522245 45.226557
n 2.000000 1.000000 1.000000
deviation 5.108979 NaN NaN
Q1 32.130315 45.522245 45.226557
NOT HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN mean 22.926762 NaN NaN
n 1.000000 NaN NaN
deviation NaN NaN NaN
Q1 22.926762 NaN NaN
WHITE mean 36.627881 38.203970 34.934976
n 3.000000 1.000000 2.000000
deviation 9.087438 NaN 4.398485
Q1 31.381246 38.203970 31.840329
Now, I want to unstack all the indices to get the following structure (i.e. inserting NaN
rows for all the index columns from first to second last, alongwith Level
column denoting the level of the index):
Placebo Treatment A Treatment B Level
HISPANIC OR LATINO NaN NaN NaN 0 <---
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1 <---
mean NaN NaN 42.910335 2
n NaN NaN 1.000000 2
deviation NaN NaN NaN 2
Q1 NaN NaN 42.910335 2
WHITE NaN NaN NaN 1 <---
mean 35.724846 45.522245 45.226557 2
n 2.000000 1.000000 1.000000 2
deviation 5.108979 NaN NaN 2
Q1 32.130315 45.522245 45.226557 2
NOT HISPANIC OR LATINO NaN NaN NaN 0 <---
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1 <---
mean 22.926762 NaN NaN 2
n 1.000000 NaN NaN 2
deviation NaN NaN NaN 2
Q1 22.926762 NaN NaN 2
WHITE NaN NaN NaN 1 <---
mean 36.627881 38.203970 34.934976 2
n 3.000000 1.000000 2.000000 2
deviation 9.087438 NaN 4.398485 2
Q1 31.381246 38.203970 31.840329 2
This question is identical to the previous question that I asked, but the problem is, there can be from 1 to 4 indices columns after aggregating, (i.e. aggregate may be applied on from 1 to 5 columns), and it's being difficult to use the same previous solution in this scenario.
Upvotes: 2
Views: 216
Reputation: 862571
Use custom function with DataFrame.append
first with custom DataFrame
filled by default NaN
values:
def f(x):
names = pd.DataFrame(index=x.name, columns=x.columns).assign(Level=[0,1])
#print (names)
return names.append(x.reset_index(level=[0,1], drop=True).assign(Level=2))
out = out.groupby(level=[0,1], group_keys=False).apply(f)
And then remove duplicated 0
Levels:
out = out[~out.index.duplicated() | out['Level'].isin([1,2])]
print (out)
TRT01A Placebo Treatment A Treatment B Level
HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean NaN NaN 42.910335 2
n NaN NaN 1.000000 2
deviation NaN NaN NaN 2
Q1 NaN NaN 42.910335 2
WHITE NaN NaN NaN 1
mean 35.724846 45.522245 45.226557 2
n 2.000000 1.000000 1.000000 2
deviation 5.108979 NaN NaN 2
Q1 32.130315 45.522245 45.226557 2
NOT HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean 22.926762 NaN NaN 2
n 1.000000 NaN NaN 2
deviation NaN NaN NaN 2
Q1 22.926762 NaN NaN 2
WHITE NaN NaN NaN 1
mean 36.627881 38.203970 34.934976 2
n 3.000000 1.000000 2.000000 2
deviation 9.087438 NaN 4.398485 2
Q1 31.381246 38.203970 31.840329 2
Upvotes: 2