Reputation: 5545
I am calculating the conditional count (only 4's and 5's divided by all values over all columns) of the following table
v1|v2|v3|
2| 3| 4|
| 5| 4|
5| 1| 4|
with this df.isin(\[4,5\]).sum().div(df.count()).mean()
. I thought it would be easy to apply this command grouped by an additional column like this
code|v1|v2|v3|
1234|2| 3| 4|
1234| | 5| 4|
1234|5| 1| 4|
2345|2| | 4|
2345| | 5| 4|
2345|5| 1| 4|
to get one value for each code
using this df.isin(\[4,5\]).groupby().sum().div(df.count()).mean()
but instead I get some weird result like
code|
v1 |0.985
v2 |0.475
v3 |0.874
Instead of
code|
1234|0.611
2345|0.666
Could anyone help me to re-write or exend the command I have so I can get values for each code
?
****************EDIT: Due to misunderstandings I add some more information
Code 1234 we have in
Column 1: one 4 or 5 and 2 values alltogether. = 1/2
Column 2: one 4 or 5 and 3 values alltogether = 1/3
Column 3: three 4 or 5 and 3 values alltogether = 3/3
Now we calculate the mean of this:= (1/2+1/3+3/3)/3=0.61111
This is for one code. I need this for all codes. In R I would say I split the table into one list for each code and calculate what I did above. I am wondering If I could do this by extending the line of code I got by my other question. Like using .groupby or so?
Upvotes: 0
Views: 61
Reputation: 13878
One way is using groupby.apply
with code
as index and then use the function provided in your answer. It's not an optimal solution but it can achieve the specific way you wanted to add calculate the average:
Given a DataFrame
that looks like this:
>>> df
code v1 v2 v3
0 1234 2 3 4
1 1234 None 5 4
2 1234 5 1 4
3 2345 2 None 4
4 2345 None 5 4
5 2345 5 1 4
Do this:
>>> df.set_index('code').groupby(level=0).apply(lambda d: d.isin((4, 5)).sum().div(d.count()).mean())
code
1234 0.611111
2345 0.666667
dtype: float64
Upvotes: 1