JohnnieL
JohnnieL

Reputation: 1231

Pandas Python create column by indexed assignment on frame with multiindex

I can not figure out how to simply/efficiently add a column to a dataframe that has a multiindex by assignment from another single level index

I can add a column to a dataframe that has a single level index as follows:

df = pd.DataFrame({'data':[1, 2, 5]}, 
                  index=pd.Index(['Alpha', 'Bravo', 'Echo'], name='item1_name'))
df

enter image description here

df_item1 = pd.DataFrame({'id':[101, 102, 103, 104, 105]}, 
                  index=pd.Index(['Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo'], name='item1_name'))
df_item1

enter image description here

df['item1_id']=df_item1['id']
df

enter image description here

What I cant figure out is how to do this on a single level of a multiindex dataframe. e.g.

df_multi = pd.DataFrame({'data':[1, 2, 5, 11, 12, 15]}, 
                  index=pd.Index([('Alpha', 'X'), ('Alpha', 'Y'), ('Bravo', 'X'), 
                                  ('Bravo', 'Y'), ('Echo', 'X'), ('Echo', 'Y')], 
                                   name=('item1_name','item2_name')))

df_multi['item1_id']=df_item1['id']

df_multi

enter image description here

I just get NaNs as the indexes arent aligning. My big picture problem is that I am receiving data with a string name and i need to be able to replace with an integer id for both levels, item1_name and item2_name

I have long solutions using unstack/stack/reindex etc but it all seems a very long way around and I feel i ought to be able to join on the index If my second look up frame is this:

df_item2 = pd.DataFrame({'id':[201, 202]}, 
                  index=pd.Index(['X', 'Y'], name='item2_name'))

df_item2

enter image description here

what I want to end up with is

enter image description here

Upvotes: 0

Views: 68

Answers (1)

jezrael
jezrael

Reputation: 863321

Use DataFrame.join with rename here, it working by match index names with MultiIndex names:

df = (df_multi.join(df_item1.rename(columns={'id':'item1_name'}))
              .join(df_item2.rename(columns={'id':'item1_nam2'})))
print (df)
                       data  item1_name  item1_nam2
item1_name item2_name                              
Alpha      X              1         101         201
           Y              2         101         202
Bravo      X              5         102         201
           Y             11         102         202
Echo       X             12         105         201
           Y             15         105         202

If names of index not match, get error:

df_item2 = pd.DataFrame({'id':[201, 202]}, 
                  index=pd.Index(['X', 'Y'], name='aaa')) <- changed name


df = (df_multi.join(df_item2.rename(columns={'id':'item1_name'})))
print (df)

ValueError: cannot join with no overlapping index names

Upvotes: 1

Related Questions