Reputation: 1075
I don't know if the question has been answered again, but I haven't found anything similar
I have a MultiIndex DataFrame with two level columns, for example:
arrays = [np.array(['bar', 'bar','bar', 'foo', 'foo','foo', 'qux', 'qux', 'qux']),
np.array(['one', 'two', 'three', 'one', 'two', 'three', 'one', 'two','three'])]
df = pd.DataFrame(np.random.randn(3, 9), columns=arrays)
print(df)
bar foo qux \
one two three one two three one
0 1.255724 -0.692387 -1.485324 2.265736 0.494645 1.973369 -0.326260
1 -0.903874 0.695460 -0.950076 0.181590 -2.345611 1.288061 0.980166
2 -0.294882 1.034745 1.423288 -0.895625 -0.847338 0.470444 0.373579
two three
0 0.136427 -0.136479
1 0.702732 -1.894376
2 0.506240 -0.456519
I want to select specific columns from the second level for every first level column independently.
For example, i want to get as result something like this:
bar foo qux
one two two one three
0 1.255724 -0.692387 0.494645 -0.326260 -0.136479
1 -0.903874 0.695460 -2.345611 0.980166 -1.894376
2 -0.294882 1.034745 -0.847338 0.373579 -0.456519
I have seen this questions but it isn't what i want to achieve.
Now I am doing it like this:
level0 = ['bar','foo','qux']
level1 = [['one','two'],['two'],['one','three']]
df_list=[]
for i,value in enumerate(level0):
df_list.append(df.loc[:,(value,level1[i])])
new_df = pd.concat([i for i in df_list],axis=1)
print(new_df)
But it doesn't seem to me as the best solution.
Is there any better (more "pandas") approach to solve this?
Upvotes: 4
Views: 294
Reputation: 150825
Instead of concatenating the data, you can pick out the columns first, and use column extraction;
cols = pd.concat([pd.DataFrame({'level_0':x, 'level_1':y})
for x,y in zip(level0,level1)]
).values
df[cols]
Output:
bar foo qux
one two two one three
0 0.729061 -0.876547 0.312557 0.736568 0.250469
1 0.619194 0.451023 0.803252 -1.636403 -0.854607
2 0.254690 -1.054859 -1.223274 0.398411 -1.448396
Upvotes: 3