AJG519
AJG519

Reputation: 3379

Calculate Column in Pandas between Single Index and MultiIndex DataFrames

Say I have the following MultiIndex empty DataFrame:

>>> df1=pd.DataFrame(data=[['a',1], ['a',2], ['b',1], ['b',2]], columns=['key1','key2']).set_index(['key1','key2'])
>>> print df1
Empty DataFrame
Columns: []
Index: [(a, 1), (a, 2), (b, 1), (b, 2)]

And I have the following DataFrame that contains my data:

>>> data=pd.DataFrame(index=['a','b'], data=[11,22], columns=['Var1'])
>>> data.index.name='key1'
>>> print data
      Var1
key1      
a       11
b       22

Given that the common index is named "key1" in both DataFrames, I would think that I could calculate a variable equal to Var1 in my empty DataFrame doing the following:

>>> df1['TestVar']=data['Var1']
>>> print df1
           TestVar
key1 key2         
a    1         NaN
     2         NaN
b    1         NaN
     2         NaN

However, this does not appear to work. Is there something I am doing wrong here? Instead I resort to the following to get my desired output

>>> df1.reset_index([1]).join(data).set_index('key2',append=True)
           Var1
key1 key2      
a    1       11
     2       11
b    1       22
     2       22

Is there a better way to do this?

Upvotes: 0

Views: 59

Answers (1)

Dickster
Dickster

Reputation: 3009

How about adding an index name to data dataframe and then use a join?

df1=pd.DataFrame(data=[['a',1], ['a',2], ['b',1], ['b',2]], columns=['key1','key2']).set_index(['key1','key2'])
data=pd.DataFrame(index=['a','b'], data=[11,22], columns=['Var1'])


data.index.names =['key1']

print df1.join(data)


           Var1
key1 key2      
a    1       11
     2       11
b    1       22
     2       22

Upvotes: 2

Related Questions