Reputation: 440
This question is an attempt to generalise the solution provided for the this question:
Pandas: add a column to a multiindex column dataframe
I need to produce a column for each column index.
The solution provided by spencerlyon2
works when we want to add a single column:
df['bar', 'three'] = [0, 1, 2]
However I would like to generalise this operation for every first level column index.
Source DF:
In [1]: df
Out[2]:
first bar baz
second one two one two
A -1.089798 2.053026 0.470218 1.440740
B 0.488875 0.428836 1.413451 -0.683677
C -0.243064 -0.069446 -0.911166 0.478370
Target DF below, requires that the three
column is the addition of the one
and two
columns of its respective index.
In [1]: df
Out[2]:
first bar baz
second one two three one two three
A -1.089798 2.053026 0.963228 1.440740 -2.317647 -0.876907
B 0.488875 0.428836 0.917711 -0.683677 0.345873 -0.337804
C -0.243064 -0.069446 -0.312510 0.478370 0.266761 0.745131
Upvotes: 8
Views: 6444
Reputation: 31011
I started from your sample input:
first bar baz
second one two one two
A -1.089798 2.053026 0.470218 1.440740
B 0.488875 0.428836 1.413451 -0.683677
C -0.243064 -0.069446 -0.911166 0.478370
To add a new column to each level 0 of the column MultiIndex, you can run something like:
for c1 in df.columns.get_level_values('first').unique():
# New column int index
cInd = int(df.columns.get_loc(c1).stop)
col = (c1, 'three') # New column name
newVal = df[(c1, 'one')] + df[(c1, 'two')]
df.insert(loc=cInd, column=col, value=newVal) # Insert the new column
In the above example, values in new columns are consecutive numbers, but in your case set them as you wish.
The result of my code (after the column sort) is:
first bar baz
second one two three one two three
A -1.089798 2.053026 0.963228 0.470218 1.440740 1.910958
B 0.488875 0.428836 0.917711 1.413451 -0.683677 0.729774
C -0.243064 -0.069446 -0.312510 -0.911166 0.478370 -0.432796
Upvotes: 1
Reputation: 59304
You can use join
with two data frames with same indexes to create a bunch of columns all at once.
First, calculate the sum using groupby
against axis=1
ndf = df.groupby(df.columns.get_level_values(0), axis=1).sum()
bar baz
A 0.963228 1.910958
B 0.917711 0.729774
C -0.312510 -0.432796
(PS: If you have more than two columns, you may do
df.loc[:, (slice(None), ['one', 'two'])].groupby(df.columns.get_level_values(0), axis=1).sum()
to slice only columns 'one' and 'two' first, and just then groupby
)
Then, make it match your column indexes, i.e. make it a MultiIndexed data frame just like your original data frame
ndf.columns = pd.MultiIndex.from_product([ndf.columns, ['three']])
bar baz
three three
A 0.963228 1.910958
B 0.917711 0.729774
C -0.312510 -0.432796
Finally, df.join
finaldf = df.join(ndf).sort_index(axis=1)
If you really care about the ordering, use reindex
finaldf.reindex(['one', 'two', 'three'], axis=1, level=1)
first bar baz
second one two three one two three
A -1.089798 2.053026 0.963228 0.470218 1.440740 1.910958
B 0.488875 0.428836 0.917711 1.413451 -0.683677 0.729774
C -0.243064 -0.069446 -0.312510 -0.911166 0.478370 -0.432796
Upvotes: 3