Ger
Ger

Reputation: 9746

Swap and group column names in a pandas DataFrame

I have a data frame with some quantitative data and one qualitative data. I would like to use describe to compute stats and group by column using the qualitative data. But I do not obtain the order I want for the level. Hereafter is an example:

df = pd.DataFrame({k: np.random.random(10) for k in "ABC"})
df["qual"] = 5 * ["init"] + 5 * ["final"]

The DataFrame looks like:

          A         B         C   qual
0  0.298217  0.675818  0.076533   init
1  0.015442  0.264924  0.624483   init
2  0.096961  0.702419  0.027134   init
3  0.481312  0.910477  0.796395   init
4  0.166774  0.319054  0.645250   init
5  0.609148  0.697818  0.151092  final
6  0.715744  0.067429  0.761562  final
7  0.748201  0.803647  0.482738  final
8  0.098323  0.614257  0.232904  final
9  0.033003  0.590819  0.943126  final

Now I would like to group by the qual column and compute statistical descriptors using describe. I did the following:

ddf = df.groupby("qual").describe().transpose()
ddf.unstack(level=0)

And I got

qual      final                          init                    
              A         B         C         A         B         C
count  5.000000  5.000000  5.000000  5.000000  5.000000  5.000000
mean   0.440884  0.554794  0.514284  0.211741  0.574539  0.433959
std    0.347138  0.284931  0.338057  0.182946  0.274135  0.355515
min    0.033003  0.067429  0.151092  0.015442  0.264924  0.027134
25%    0.098323  0.590819  0.232904  0.096961  0.319054  0.076533
50%    0.609148  0.614257  0.482738  0.166774  0.675818  0.624483
75%    0.715744  0.697818  0.761562  0.298217  0.702419  0.645250
max    0.748201  0.803647  0.943126  0.481312  0.910477  0.796395

I am close to what I want but I would like to swap and group the column index such as:

                       A               B               C
qual      initial  final  initial  final  initial  final

Is there a way to do it ?

Upvotes: 1

Views: 235

Answers (2)

mrcrnkovich
mrcrnkovich

Reputation: 46

Try ddf.stack().unstack(level=[0,2]), inplace of ddf.unstack(level=0)

Upvotes: 0

Erfan
Erfan

Reputation: 42886

Use columns.swaplevel and then sort_index by level=0 and axis='columns':

ddf = df.groupby('qual').describe().T.unstack(level=0)

ddf.columns = ddf.columns.swaplevel(0,1)
ddf = ddf.sort_index(level=0, axis='columns')

Or in one line using DataFrame.swaplevel instead of index.swaplevel:

ddf = ddf.swaplevel(0,1, axis=1).sort_index(level=0, axis='columns')
          A          B          C     
qual  final init final init final init
count  5.00 5.00  5.00 5.00  5.00 5.00
mean   0.44 0.21  0.55 0.57  0.51 0.43
std    0.35 0.18  0.28 0.27  0.34 0.36
min    0.03 0.02  0.07 0.26  0.15 0.03
25%    0.10 0.10  0.59 0.32  0.23 0.08
50%    0.61 0.17  0.61 0.68  0.48 0.62
75%    0.72 0.30  0.70 0.70  0.76 0.65
max    0.75 0.48  0.80 0.91  0.94 0.80

Upvotes: 2

Related Questions