orange
orange

Reputation: 8090

Set Multi-Index DataFrame column by Series with Index

I'm struggling with a MultiIndex dataframe (a) which requires the column x to be set by b which isn't a MultiIndex and has only 1 index level (first level of a). I have an index to change those values (ix), which is why I am using .loc[] for indexing. The problem is that the way missing index levels are populated in a is not what I require (see example).

>>> a = pd.DataFrame({'a': [1, 2, 3], 'b': ['b', 'b', 'b'], 'x': [4, 5, 6]}).set_index(['a', 'b'])
>>> a
     x
a b   
1 b  4
2 b  5
3 b  6

>>> b = pd.DataFrame({'a': [1, 4], 'x': [9, 10]}).set_index('a')
>>> b
    x
a    
1   9
4  10

>>> ix = a.index[[0, 1]]
>>> ix
MultiIndex(levels=[[1, 2, 3], [u'b']],
           codes=[[0, 1], [0, 0]],
           names=[u'a', u'b'])

>>> a.loc[ix]
     x
a b   
1 b  4
2 b  5
>>> a.loc[ix, 'x'] = b['x']
>>> # wrong result (at least not what I want)
>>> a
       x
a b     
1 b  NaN
2 b  NaN
3 b  6.0

>>> # expected result
>>> a
     x
a b   
1 b  9  # index: a=1 is part of DataFrame b
2 b  5  # other indices don't exist in b and...
3 b  6  # ... x-values remain unchanged
        # if there were more [1, ...] indices...
        # ...x would also bet set to 9

Upvotes: 2

Views: 524

Answers (4)

Johnny
Johnny

Reputation: 694

I think you want to merge a and B. you should consider using concat,merge or join funcs.

Upvotes: 1

edesz
edesz

Reputation: 12406

I first reset the multi-index of a and then I set it to the (single column) a

a = a.reset_index()
a = a.set_index('a')

print(a)
   b  x
a      
1  b  4
2  b  5
3  b  6
print(b)
    x
a    
1   9
4  10

Then, make the assignment you require using loc and also re-set the multi-index

  • now, since we are using loc, your ix = a.index[[0, 1]] becomes similar to [1,0] (1 refers to index of a and 0 refers to index of b)
a.loc[1, 'x'] = b.iloc[0,0]
a.reset_index(inplace=True)
a = a.set_index(['a','b'])

print(a)
     x
a b   
1 b  9
2 b  5
3 b  6

EDIT:

Alternatively, reset the multi-index of a and don't set it to a single column index. Then your [0,1] (referring to index values with loc, not positions iloc) can be used (0 refers to index of a and 1 refers to index of b)

a = a.reset_index()

print(a)
   a  b  x
0  1  b  4
1  2  b  5
2  3  b  6
a.loc[0, 'x'] = b.loc[1,'x']
a = a.set_index(['a','b'])

print(a)
     x
a b   
1 b  9
2 b  5
3 b  6

Upvotes: 0

Manualmsdos
Manualmsdos

Reputation: 1545

You try use 1- index frame with 2- index frame, just use values:

EDIT:

import pandas as pd

a = pd.DataFrame({'a': [1, 2, 3], 'b': ['b', 'b', 'b'], 'x': [4, 5, 6]}).set_index(['a', 'b'])
b = pd.DataFrame({'a': [1, 4], 'x': [9, 10]}).set_index('a')

a_ix = a.index.get_level_values('a')[[0, 1]]
b_ix = b.index    
mask = (b_ix == a_ix)

a.loc[mask, 'x'] = b.loc[mask,'x'].values

a:

        x
a   b   
1   b   9
2   b   5
3   b   6

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

I can't think of any one-liner, so here's a multi-step approach:

tmp_df = a.loc[ix, ['x']].reset_index(level=1, drop=True)
tmp_df['x'] = b['x']
tmp_df.index = ix

a.loc[ix, 'x'] = tmp_df['x']

Output:

        x
a   b   
1   b   9.0
2   b   5.0
3   b   6.0

Edit: I assume that the b's in index are symbolic. Otherwise, the code will fail from a.loc[ix, 'x']: for

a = pd.DataFrame({'a': [1, 1, 2, 3], 
                  'b': ['b', 'b', 'b', 'b'], 
                  'x': [4, 5, 3, 6]}).set_index(['a', 'b'])

a.loc[ix,'x'] gives:

a  b
1  b    4
   b    5
   b    4
   b    5
Name: x, dtype: int64

Upvotes: 0

Related Questions