Conditional count calculations groupwise

Question

I am calculating the conditional count (only 4's and 5's divided by all values over all columns) of the following table

v1|v2|v3|
 2| 3| 4|
  | 5| 4|
 5| 1| 4|

with this df.isin($$4,5$$).sum().div(df.count()).mean() . I thought it would be easy to apply this command grouped by an additional column like this

code|v1|v2|v3|
1234|2| 3| 4|
1234| | 5| 4|
1234|5| 1| 4|
2345|2|  | 4|
2345| | 5| 4|
2345|5| 1| 4|

to get one value for each code using this df.isin($$4,5$$).groupby().sum().div(df.count()).mean() but instead I get some weird result like

code|
v1  |0.985
v2  |0.475
v3  |0.874

Instead of

code|
1234|0.611
2345|0.666

Could anyone help me to re-write or exend the command I have so I can get values for each code?

****************EDIT: Due to misunderstandings I add some more information

Code 1234 we have in

Column 1: one 4 or 5 and 2 values alltogether. = 1/2
Column 2: one 4 or 5 and 3 values alltogether = 1/3
Column 3: three 4 or 5 and 3 values alltogether = 3/3

Now we calculate the mean of this:= (1/2+1/3+3/3)/3=0.61111

This is for one code. I need this for all codes. In R I would say I split the table into one list for each code and calculate what I did above. I am wondering If I could do this by extending the line of code I got by my other question. Like using .groupby or so?

r.ook · Accepted Answer

One way is using groupby.apply with code as index and then use the function provided in your answer. It's not an optimal solution but it can achieve the specific way you wanted to add calculate the average:

Given a DataFrame that looks like this:

>>> df
   code    v1    v2 v3
0  1234     2     3  4
1  1234  None     5  4
2  1234     5     1  4
3  2345     2  None  4
4  2345  None     5  4
5  2345     5     1  4

Do this:

>>> df.set_index('code').groupby(level=0).apply(lambda d: d.isin((4, 5)).sum().div(d.count()).mean())
code
1234    0.611111
2345    0.666667
dtype: float64

Conditional count calculations groupwise

Answers (1)

Related Questions