Reputation: 5559
I am trying to concatenate 2 dataframe df1
and df2
df1
is a multiindex dataframe and df2
has less rows than df1
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df1 = pd.DataFrame(np.random.randn(8), index=index)
df1
Out[15]:
0
first second
bar one -0.185560
two -2.358254
baz one 1.130550
two 1.441708
foo one -1.163076
two 1.776814
qux one -0.811836
two 0.389500
df2 = pd.DataFrame(data=[0,1,0,1],index=['bar','baz','foo', 'qux'],columns=['label'])
df2
Out[18]:
label
bar 0
baz 1
foo 0
qux 1
The desired result would be something like:
df3
Out[18]:
0 label
first second
bar one -0.185560 0
two -2.358254 0
baz one 1.130550 1
two 1.441708 1
foo one -1.163076 0
two 1.776814 0
qux one -0.811836 1
two 0.389500 1
Upvotes: 1
Views: 51
Reputation: 210832
In [132]: df1['label'] = df1.index.get_level_values(0).to_series().map(df2['label']).values
In [133]: df1
Out[133]:
0 label
first second
bar one 0.143211 0
two 1.133454 0
baz one 1.298973 1
two -0.717844 1
foo one -0.663768 0
two 0.687015 0
qux one 0.412729 1
two 0.366502 1
or a better option (thanks to @Dark for the hint):
df1['label'] = df1.index.get_level_values(0).map(df2['label'].get)
Upvotes: 2
Reputation: 393963
Another method is to just reset_index
on the second level, you can then just add the column which will align on the first level index values, and then set the index back again:
In[52]:
df3 = df1.reset_index(level=1)
df3['label'] = df2['label']
df3 = df3.set_index([df3.index, 'second'])
df3
Out[52]:
0 label
first second
bar one 0.957417 0
two -0.466755 0
baz one 1.064326 1
two 1.036983 1
foo one -1.319737 0
two 0.064465 0
qux one -0.237232 1
two -0.511889 1
Upvotes: 2