Reputation:
I have a dataframe like below having patients stay in ICU (in hours) that is shown by ICULOS.
df # Main dataframe
dfy = df.copy()
P_ID | ICULOS | Count |
---|---|---|
1 | 1 | 5 |
1 | 2 | 5 |
1 | 3 | 5 |
1 | 4 | 5 |
1 | 5 | 5 |
2 | 1 | 9 |
2 | 2 | 9 |
2 | 3 | 9 |
2 | 4 | 9 |
2 | 5 | 9 |
2 | 6 | 9 |
2 | 7 | 9 |
2 | 8 | 9 |
2 | 9 | 9 |
3 | 1 | 3 |
3 | 2 | 3 |
3 | 3 | 3 |
4 | 1 | 7 |
4 | 2 | 7 |
4 | 3 | 7 |
4 | 4 | 7 |
4 | 5 | 7 |
4 | 6 | 7 |
4 | 7 | 7 |
I calculated their ICULOS Count and placed in the new column named Count using the code:
dfy['Count'] = dfy.groupby(['P_ID'])['ICULOS'].transform('count')
Now, I want to remove those patients based on P_ID whose Count is less than 8. (Note, I want to remove whole patient record). So, after removing the patients with Count < 8, Only the P_ID = 2 will remain as the count is 9.
The desired output:
P_ID | ICULOS | Count |
---|---|---|
2 | 1 | 9 |
2 | 2 | 9 |
2 | 3 | 9 |
2 | 4 | 9 |
2 | 5 | 9 |
2 | 6 | 9 |
2 | 7 | 9 |
2 | 8 | 9 |
2 | 9 | 9 |
I tried the following code, but for some reason, it is not working for me. It did worked for me but when I re-run the code after few days, it is giving me 0 result. Can someone suggest a better code? Thanks.
dfy = dfy.drop_duplicates(subset=['P_ID'],keep='first')
lis1 = dfy['P_ID'].tolist()
Icu_less_8 = dfy.loc[dfy['Count'] < 8]
lis2 = Icu_less_8.P_ID.to_list()
lis_3 = [k for k in tqdm_notebook(lis1) if k not in lis2]
# removing those patients who have ICULOS of less than 8 hours
df_1 = pd.DataFrame()
for l in tqdm_notebook(lis_3, desc = 'Progress'):
df_1 = df_1.append(df.loc[df['P_ID']==l])
Upvotes: 2
Views: 144
Reputation: 34046
You can directly filter rows in transform using Series.ge
:
In [1521]: dfy[dfy.groupby(['P_ID'])['ICULOS'].transform('count').ge(8)]
Out[1521]:
P_ID ICULOS Count
5 2 1 9
6 2 2 9
7 2 3 9
8 2 4 9
9 2 5 9
10 2 6 9
11 2 7 9
12 2 8 9
13 2 9 9
EDIT after OP's comment: For multiple conditions, do:
In [1533]: x = dfy.groupby(['P_ID'])['ICULOS'].transform('count')
In [1539]: dfy.loc[x[x.ge(8) & x.le(72)].index]
Out[1539]:
P_ID ICULOS Count
5 2 1 9
6 2 2 9
7 2 3 9
8 2 4 9
9 2 5 9
10 2 6 9
11 2 7 9
12 2 8 9
13 2 9 9
Upvotes: 3