Reputation: 3826
I have got a dataframe:
df = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': [1,0,0,1,1,0,0,1]})
df2 = df.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])
df2['A']['a']['x'][4] = 1
df2['B']['a']['x'][3] = 1
variable1 A B
variable2 a b a b
variable3 x y x y x y
index
0 1 NaN NaN NaN NaN NaN
1 NaN NaN 0 NaN NaN NaN
2 NaN NaN NaN NaN 0 NaN
3 NaN NaN NaN NaN 1 1
4 1 1 NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN 0
6 NaN NaN NaN NaN 0 NaN
7 NaN NaN NaN 1 NaN NaN
Now I want to check for simultaneous occurrences of x == 1
and y == 1
, but only within each subgroup, defined by variable1
and variable2
. So, for the dataframe shown above, the condition is met for index == 4
(group A-a
), but not for index == 3
(groups B-a
and B-b
).
I suppose some groupby()
magic would be needed, but I cannot find the right way. I have also tried experimenting with a stacked dataframe (using df.stack()
), but this did not get me any closer...
Upvotes: 0
Views: 871
Reputation: 32125
You can use groupby
on the 2 first levels variable1
and variable2
to get the sum of the x
and y
columns at that level:
r = df2.groupby(level=[0,1], axis=1).sum()
r
Out[50]:
variable1 A B
variable2 a b a b
index
0 1 NaN NaN NaN
1 NaN 0 NaN NaN
2 NaN NaN 0 NaN
3 NaN NaN 1 1
4 2 NaN NaN NaN
5 NaN NaN NaN 0
6 NaN NaN 0 NaN
7 NaN 1 NaN NaN
Consequently, the lines you are searching for are the ones that contain the value 2:
r[r==2].dropna(how='all')
Out[53]:
variable1 A B
variable2 a b a b
index
4 2 NaN NaN NaN
Upvotes: 1