Reputation: 141
From a dataframe after using groupby want to identify the success/unsuccessful rate. With my limited knowledge in Python and pandas though I made it, but looks inefficient and clumsy. There must be some simpler and efficient way. Any help would be much appreciated.
BldgID BldHt Device Date Time Result
1213 35 758 20181120 105550 OK
1213 35 758 20181120 105540 NG
1112 40 780 20181120 003102 OK
1117 26 790 20181120 002557 OK
1111 65 780 20181120 002102 NG
1214 80 758 20181120 001600 OK
2902 34 780 20181119 005410 OK
1232 90 780 20181119 001410 OK
1222 75 760 20181119 010552 OK
1214 80 758 20181119 010052 OK
1214 80 758 20181119 005553 NG
1246 16 790 20181119 004556 OK
1128 15 758 20181119 004552 OK
1128 15 758 20181119 004552 OK
1211 30 790 20181119 003557 NG
1211 30 790 20181119 003558 OK
Here is the dataframe (csv). At first I want to group the [BldgID, BldHt, Device] and find their percentage of Result, i.e. OK/OK+NG. What I did was, change 'OK' to '1' and 'NG' to '0'. Sum all the 'OK'. To find the total number (OK+NG), I used the dataframe before changing their value (1 & 0) and calculated the total num. Otherwise, it did not count the NGThen dividing them to find the percentage of 'OK'.
df = pd.read_csv("data.csv")
df1 = df.groupby(['BldgID','Device'])['Result'].agg('sum').reset_index()
df1 = df.replace({'OK':1, 'NG': 0})
df1 = df1.groupby(['BldgID','Device'])['Result'].agg('sum').reset_index()
df1['NumOKs'] = df1['Result']
# used the original df i.e. change OK=1, NG=0, to count the total num of
# OK+NG
df2 = df.groupby(['BldgID','Device'])
['Result'].agg('count').reset_index()
df2['sum'] = mel_df2['Result']
df2.drop(['Result'], axis=1, inplace=True)
df3 = pd.concat([df1['NumOKs'], df2['sum']], axis=1, keys=
['NumOKs','sum'])
df3.head(10)
# sum represents OK+NG
NumOKs sum
0 2 2
1 6 6
2 2 2
3 2 2
4 2 2
5 3 4
6 3 3
7 3 3
8 2 3
9 3 3
I got what I wanted but it looks very ugly and inefficient. But with my limited knowledge I have to find something quick and it works. Want to achieve this in a better and simpler way. note: The result may not be exactly reproduced as I cut-paste a small part of code and result to explain what I want and how (badly) I did it.
Upvotes: 0
Views: 288
Reputation: 153470
IIUC, try using mean
and boolean condition:
df.groupby(['BldgID', 'Device'])['Result'].apply(lambda x : (x=='OK').mean())
To add this to your dataframe:
df['mean_ok'] = df.groupby(['BldgID', 'Device'])['Result']\
.apply(lambda x : (x=='OK').mean())
Upvotes: 2