SO_tourist
SO_tourist

Reputation: 440

Pandas: add multiple columns to a multiindex column dataframe

This question is an attempt to generalise the solution provided for the this question:

Pandas: add a column to a multiindex column dataframe

I need to produce a column for each column index.

The solution provided by spencerlyon2 works when we want to add a single column:

df['bar', 'three'] = [0, 1, 2]

However I would like to generalise this operation for every first level column index.

Source DF:

In [1]: df
Out[2]:
first        bar                 baz
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

Target DF below, requires that the three column is the addition of the one and two columns of its respective index.

In [1]: df
Out[2]:
first        bar                           baz                 
second       one       two     three       one       two      three
A      -1.089798  2.053026  0.963228‬  1.440740 -2.317647  -0.876907‬
B       0.488875  0.428836  0.917711 -0.683677  0.345873  -0.337804‬
C      -0.243064 -0.069446 -0.312510  0.478370  0.266761   0.745131‬

Upvotes: 8

Views: 6444

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 31011

I started from your sample input:

first        bar                 baz          
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

To add a new column to each level 0 of the column MultiIndex, you can run something like:

for c1 in df.columns.get_level_values('first').unique():
    # New column int index
    cInd = int(df.columns.get_loc(c1).stop)
    col = (c1, 'three')      # New column name
    newVal = df[(c1, 'one')] + df[(c1, 'two')]
    df.insert(loc=cInd, column=col, value=newVal)  # Insert the new column

In the above example, values in new columns are consecutive numbers, but in your case set them as you wish.

The result of my code (after the column sort) is:

first        bar                           baz                    
second       one       two     three       one       two     three
A      -1.089798  2.053026  0.963228  0.470218  1.440740  1.910958
B       0.488875  0.428836  0.917711  1.413451 -0.683677  0.729774
C      -0.243064 -0.069446 -0.312510 -0.911166  0.478370 -0.432796

Upvotes: 1

rafaelc
rafaelc

Reputation: 59304

You can use join with two data frames with same indexes to create a bunch of columns all at once.


First, calculate the sum using groupby against axis=1

ndf = df.groupby(df.columns.get_level_values(0), axis=1).sum()

        bar       baz
A  0.963228  1.910958
B  0.917711  0.729774
C -0.312510 -0.432796

(PS: If you have more than two columns, you may do

df.loc[:, (slice(None), ['one', 'two'])].groupby(df.columns.get_level_values(0), axis=1).sum()

to slice only columns 'one' and 'two' first, and just then groupby)

Then, make it match your column indexes, i.e. make it a MultiIndexed data frame just like your original data frame

ndf.columns = pd.MultiIndex.from_product([ndf.columns, ['three']])

        bar       baz
      three     three
A  0.963228  1.910958
B  0.917711  0.729774
C -0.312510 -0.432796

Finally, df.join

finaldf = df.join(ndf).sort_index(axis=1)

If you really care about the ordering, use reindex

finaldf.reindex(['one', 'two', 'three'], axis=1, level=1)

first        bar                           baz                    
second       one       two     three       one       two     three
A      -1.089798  2.053026  0.963228  0.470218  1.440740  1.910958
B       0.488875  0.428836  0.917711  1.413451 -0.683677  0.729774
C      -0.243064 -0.069446 -0.312510 -0.911166  0.478370 -0.432796

Upvotes: 3

Related Questions