user17978261
user17978261

Reputation:

Calculate count of a column based on other column in python dataframe

I have a dataframe like below having patients stay in ICU (in hours) that is shown by ICULOS.

df # Main dataframe
dfy = df.copy()

dfy

P_ID ICULOS Count
1 1 5
1 2 5
1 3 5
1 4 5
1 5 5
2 1 9
2 2 9
2 3 9
2 4 9
2 5 9
2 6 9
2 7 9
2 8 9
2 9 9
3 1 3
3 2 3
3 3 3
4 1 7
4 2 7
4 3 7
4 4 7
4 5 7
4 6 7
4 7 7

I calculated their ICULOS Count and placed in the new column named Count using the code:

dfy['Count'] = dfy.groupby(['P_ID'])['ICULOS'].transform('count')

Now, I want to remove those patients based on P_ID whose Count is less than 8. (Note, I want to remove whole patient record). So, after removing the patients with Count < 8, Only the P_ID = 2 will remain as the count is 9.

The desired output:

P_ID ICULOS Count
2 1 9
2 2 9
2 3 9
2 4 9
2 5 9
2 6 9
2 7 9
2 8 9
2 9 9

I tried the following code, but for some reason, it is not working for me. It did worked for me but when I re-run the code after few days, it is giving me 0 result. Can someone suggest a better code? Thanks.

dfy = dfy.drop_duplicates(subset=['P_ID'],keep='first')
lis1 = dfy['P_ID'].tolist()
Icu_less_8 = dfy.loc[dfy['Count'] < 8]
lis2 = Icu_less_8.P_ID.to_list()
lis_3  = [k for k in tqdm_notebook(lis1) if k not in lis2]
# removing those patients who have ICULOS of less than 8 hours
df_1 = pd.DataFrame()
for l in tqdm_notebook(lis_3, desc = 'Progress'):
    df_1 = df_1.append(df.loc[df['P_ID']==l])

Upvotes: 2

Views: 144

Answers (1)

Mayank Porwal
Mayank Porwal

Reputation: 34046

You can directly filter rows in transform using Series.ge:

In [1521]: dfy[dfy.groupby(['P_ID'])['ICULOS'].transform('count').ge(8)]
Out[1521]: 
    P_ID  ICULOS  Count
5      2       1      9
6      2       2      9
7      2       3      9
8      2       4      9
9      2       5      9
10     2       6      9
11     2       7      9
12     2       8      9
13     2       9      9

EDIT after OP's comment: For multiple conditions, do:

In [1533]: x = dfy.groupby(['P_ID'])['ICULOS'].transform('count')
In [1539]: dfy.loc[x[x.ge(8) & x.le(72)].index]
Out[1539]: 
    P_ID  ICULOS  Count
5      2       1      9
6      2       2      9
7      2       3      9
8      2       4      9
9      2       5      9
10     2       6      9
11     2       7      9
12     2       8      9
13     2       9      9

Upvotes: 3

Related Questions