Reputation: 5314
I want to build a large pandas DataFrame in a loop. In the first iteration the DataFrame df1
is still empty. When I join df1
with df2
that has a MultiIndex, the Index gets squashed somehow.
df1 = pd.DataFrame(index=range(6))
df2 = pd.DataFrame(np.random.randn(6, 3),
columns=pd.MultiIndex.from_arrays((['A','A','A'],
['a', 'b', 'c'])))
df1[df2.columns] = df2
df1
(A, a) (A, b) (A, c)
0 -0.673923 1.392369 1.848935
1 1.427368 0.042691 0.130962
2 -0.258589 0.216157 0.196101
3 -1.022283 1.312113 -0.770108
4 0.511127 -0.633477 -0.229149
5 -1.364237 0.713107 2.124274
I was hoping for a DataFrame with the MultiIndex intact like this:
A
a b c
0 -0.673923 1.392369 1.848935
1 1.427368 0.042691 0.130962
2 -0.258589 0.216157 0.196101
3 -1.022283 1.312113 -0.770108
4 0.511127 -0.633477 -0.229149
5 -1.364237 0.713107 2.124274
What am I doing wrong?
Upvotes: 3
Views: 190
Reputation: 323326
The multiple index will not always recognized when we do assign for a simple index , so
df1 = pd.DataFrame(index=range(6),columns=pd.MultiIndex.from_arrays([[],[]]))
df1[df2.columns] = df2
df1
Out[697]:
A
a b c
0 -0.755397 0.574920 0.901570
1 -0.165472 -1.865715 1.583416
2 -0.403287 1.358329 0.706650
3 0.028019 1.432543 -0.586325
4 -0.414851 0.825253 0.745090
5 0.389917 0.940657 0.125837
Upvotes: 3