Dance Party
Dance Party

Reputation: 3713

Pandas multi-index subtract from value based on value in other column

Given the following data frame:

df = pd.DataFrame({
    ('A', 'a'): [23, 'n/a',54,7,32,76],
    ('B', 'b'): [23, 'n/a',54,7,32,76],
    ('possible','possible'):[100,100,100,100,100,100]
    })
df
    A       B   possible
    a       b   
0   23      23      100
1   n/a     n/a     100
2   54      54      100
3   7       n/a     100
4   32      32      100
5   76      76      100

I'd like to adjust 'possible', per row, for every instance of 'n/a' such that each instance will subtract 4 from 'possible'. The desired result is as follows:

    A       B   possible
    a       b   possible
0   23      23      100
1   n/a     n/a     92
2   54      54      100
3   7       n/a     96
4   32      32      100
5   76      76      100

Then when that's done, I want every instance of 'n/a' to be converted to 0 so that the column type is integer (but float will do).

Thanks in advance!

Follow-up question:

What if my multi-index is like this:

df = pd.DataFrame({
        ('A', 'a'): [23, 'n/a',54,7,32,76],
        ('A', 'b'): [23, 'n/a',54,7,32,76],
        ('B', 'b'): [23, 'n/a',54,7,32,76],
        ('possible','possible'):[100,100,100,100,100,100]
        })

I have 5 upper level indices and 25 lower level ones. I'm wondering if it's possible to only refer to the top ones in

no4 = (df.loc[:, (top level indices),(bottom level indices)] == 'n/a').sum(axis=1) 

Upvotes: 2

Views: 1358

Answers (1)

jezrael
jezrael

Reputation: 863361

I think you can checking values by mask with boolean indexing. Last replace all values n/a to 0:

Check values values with n/a and sum them:

idx = pd.IndexSlice
no4 = (df.loc[:, idx[('A', 'B'), ('a', 'b')]] == 'n/a').sum(axis=1) 
print no4
0    0
1    2
2    0
3    1
4    0
5    0
dtype: int64

Check if sum are equal 0 (it means there are n/a values):

mask = no4 != 0
print mask
0    False
1     True
2    False
3     True
4    False
5    False
dtype: bool

Substract 4 times no4:

df.loc[mask, idx['possible', 'possible']] -= no4 * 4
df.replace({'n/a':0}, inplace=True)
print df
    A   B possible
    a   b possible
0  23  23    100.0
1   0   0     92.0
2  54  54    100.0
3   7   0     96.0
4  32  32    100.0
5  76  76    100.0

EDIT:

I found more simplier solution - mask is not necessary, becaue you substract 0 if n/a:

idx = pd.IndexSlice
print (df.loc[:, idx[('A', 'B'), ('a', 'b')]] == 'n/a').sum(axis=1) * 4
0    0
1    8
2    0
3    4
4    0
5    0
dtype: int64

df.loc[:, idx['possible', 'possible']] -= 
(df.loc[:, idx[('A', 'B'), ('a', 'b')]] == 'n/a').sum(axis=1) * 4
df.replace({'n/a':0}, inplace=True)
print df
    A   B possible
    a   b possible
0  23  23      100
1   0   0       92
2  54  54      100
3   7   0       96
4  32  32      100
5  76  76      100

EDIT1: If you need select only tom indices - see using slicers:

(df.loc[:, idx[(top level indices),:]] == 'n/a').sum(axis=1) 

Upvotes: 2

Related Questions