Reputation: 3713
Given the following data frame:
df = pd.DataFrame({
('A', 'a'): [23, 'n/a',54,7,32,76],
('B', 'b'): [23, 'n/a',54,7,32,76],
('possible','possible'):[100,100,100,100,100,100]
})
df
A B possible
a b
0 23 23 100
1 n/a n/a 100
2 54 54 100
3 7 n/a 100
4 32 32 100
5 76 76 100
I'd like to adjust 'possible', per row, for every instance of 'n/a' such that each instance will subtract 4 from 'possible'. The desired result is as follows:
A B possible
a b possible
0 23 23 100
1 n/a n/a 92
2 54 54 100
3 7 n/a 96
4 32 32 100
5 76 76 100
Then when that's done, I want every instance of 'n/a' to be converted to 0 so that the column type is integer (but float will do).
Thanks in advance!
Follow-up question:
What if my multi-index is like this:
df = pd.DataFrame({
('A', 'a'): [23, 'n/a',54,7,32,76],
('A', 'b'): [23, 'n/a',54,7,32,76],
('B', 'b'): [23, 'n/a',54,7,32,76],
('possible','possible'):[100,100,100,100,100,100]
})
I have 5 upper level indices and 25 lower level ones. I'm wondering if it's possible to only refer to the top ones in
no4 = (df.loc[:, (top level indices),(bottom level indices)] == 'n/a').sum(axis=1)
Upvotes: 2
Views: 1358
Reputation: 863361
I think you can checking values by mask with boolean indexing
. Last replace
all values n/a
to 0
:
Check values values with n/a
and sum
them:
idx = pd.IndexSlice
no4 = (df.loc[:, idx[('A', 'B'), ('a', 'b')]] == 'n/a').sum(axis=1)
print no4
0 0
1 2
2 0
3 1
4 0
5 0
dtype: int64
Check if sum are equal 0
(it means there are n/a
values):
mask = no4 != 0
print mask
0 False
1 True
2 False
3 True
4 False
5 False
dtype: bool
Substract 4
times no4
:
df.loc[mask, idx['possible', 'possible']] -= no4 * 4
df.replace({'n/a':0}, inplace=True)
print df
A B possible
a b possible
0 23 23 100.0
1 0 0 92.0
2 54 54 100.0
3 7 0 96.0
4 32 32 100.0
5 76 76 100.0
EDIT:
I found more simplier solution - mask is not necessary, becaue you substract 0
if n/a
:
idx = pd.IndexSlice
print (df.loc[:, idx[('A', 'B'), ('a', 'b')]] == 'n/a').sum(axis=1) * 4
0 0
1 8
2 0
3 4
4 0
5 0
dtype: int64
df.loc[:, idx['possible', 'possible']] -=
(df.loc[:, idx[('A', 'B'), ('a', 'b')]] == 'n/a').sum(axis=1) * 4
df.replace({'n/a':0}, inplace=True)
print df
A B possible
a b possible
0 23 23 100
1 0 0 92
2 54 54 100
3 7 0 96
4 32 32 100
5 76 76 100
EDIT1: If you need select only tom indices - see using slicers:
(df.loc[:, idx[(top level indices),:]] == 'n/a').sum(axis=1)
Upvotes: 2