Reputation: 3573
I am trying to join two pandas dataframes; The left one, has a multiindex and the right one is just a plain vanilla dataframe. I would like to join the index of the right dataframe on one of the levels of the left dataframe. For example if we have the following example:
Age
Boys
Sam 21
John 22
Girls
Lisa 23
and
Points
John 1
Lisa 2
Sam 3
I would like to end up with this:
Age Points
Boys
Sam 21 3
John 22 1
Girls
Lisa 23 2
The way I have worked it out is as follows, I am just wondering if there is a more straightforward way
In[2]: import pandas as pd
In[3]: idx = pd.MultiIndex(levels=[['Boys', 'Girls', ''],['Sam', 'John', 'Lisa', '']], labels=[[0,2,2,1,2],[3,0,1,3,2]])
df1 = pd.DataFrame({'Age':['',21,22,'',23]}, index=idx)
df2 = pd.DataFrame({'Points':[1, 2, 3]}, index=['John','Lisa','Sam'])
In[4]: df1
Out[4]:
Age
Boys
Sam 21
John 22
Girls
Lisa 23
In[5]: df2
Out[5]:
Points
John 1
Lisa 2
Sam 3
I have then written this loop which "transforms" the right dataframe by giving it a multi-index and the values appropriately rearranged
lvl = df1.index.levels[1]
lbl = df1.index.labels[1]
y = df2.iloc[:,0].values.tolist()
z=[]
for x in [lvl[k] for k in lbl]:
try:
idx = df2.index.tolist().index(x)
except ValueError as e:
z.append('')
else:
z.append(y[idx])
temp=pd.DataFrame(index=df1.index)
temp['Points'] = z
I can now join them
out = df1.join(temp)
out
Out[6]:
Age Points
Boys
Sam 21 3
John 22 1
Girls
Lisa 23 2
Upvotes: 1
Views: 110
Reputation: 210912
Name your indexes - it will help Pandas to understand how to join your data frames:
In [72]: df1
Out[72]:
Age
sex name
Boys
Sam 21
John 22
Girls
Lisa 23
In [73]: df1.index.names=['sex','name']
In [74]: df2.index.name = 'name'
Joining can be pretty easy now:
In [75]: df1.join(df2)
Out[75]:
Age Points
sex name
Boys NaN
Sam 21 3
John 22 1
Girls NaN
Lisa 23 2
PS NaNs - are result of your empty rows
Upvotes: 2