cncggvg
cncggvg

Reputation: 687

pandas, how to add columns to a multiindex column DataFrame

This is my original DataFrame (with multiindex column):

In [72]:df
Out[72]: 
          a                   b          
          x         y         x         y
0  1.545293 -0.459270  0.899254 -1.010453
1  0.458760  0.275400 -0.190951  0.169195
2 -0.941817  1.109823  0.077953 -0.247074
3  1.790101 -1.643470  0.979625 -1.704657
4 -2.044814 -0.243726 -0.039724  0.600066

and I have another DataFrame:

In [77]:df2
Out[77]: 
          x         y
0 -1.085869 -0.952949
1  0.601585  0.570050
2  0.328601  0.802610
3 -0.415952 -0.090088
4  0.757545 -0.736933

how can I add df2's columns to df to get a new DataFrame like this:

In [83]:df3
Out[83]: 
          a                   b                   c          
          x         y         x         y         x         y
0  1.545293 -0.459270  0.899254 -1.010453 -1.085869 -0.952949
1  0.458760  0.275400 -0.190951  0.169195  0.601585  0.570050
2 -0.941817  1.109823  0.077953 -0.247074  0.328601  0.802610
3  1.790101 -1.643470  0.979625 -1.704657 -0.415952 -0.090088
4 -2.044814 -0.243726 -0.039724  0.600066  0.757545 -0.736933

My current approach is to use a for loop:

for col in df2.columns:
    df['c', col] = df2[col]

is there any method to avoid the loop?

Upvotes: 7

Views: 4534

Answers (4)

Andi
Andi

Reputation: 4855

First, add a column level ("c") to df2. Then, concatenate both dataframes.

df2.columns = pd.MultiIndex.from_product(["c"], df.columns])
df = pd.concat([df, df2], axis=1)

Upvotes: 1

Adriel M. Vieira
Adriel M. Vieira

Reputation: 121

I think stacking is the best solution here:

# First stack df and df2
df, df2 = df.stack(), df2.stack()

# Then add df2 to df as a column
df2['c'] = df2

# Now unstack
df2 = df2.unstack()

Upvotes: 0

Luke Davis
Luke Davis

Reputation: 2666

I discovered another way to do this in the general case (running Python 3.6), without having to explicitly deconstruct the DataFrame. You can use pd.concat with the dictionary argument,

df3 = pd.concat({**df1, **{('c',nm):val for nm,val in df2.items()})

** expansion on DataFrame objects seems to return a dictionary of Series objects with "names" equal to the column name string/value, or if the columns are MultiIndexed, the tuple containing the hieararchy of column string/values. Then, when read back into pd.concat as a dictionary, Pandas re-constructs the MultiIndexed columns from the tuples.

Note this is much less efficient than the direct assignment you were doing! Since it has to deconstruct each column and MultiIndex of the dataframe, then re-combine.

Upvotes: 1

Kartik
Kartik

Reputation: 8683

Try pd.concat:

pieces = {'a' : df1['a'],
          'b' : df1['b'],
          'c' : df2}
df3 = pd.concat(pieces, axis=1)

Upvotes: 1

Related Questions