Reputation: 1
classes money mask
student
6 psp -26.23 adj|psp # keep
6 tnt 0.00 adj|psp
6 nvm -87.42 adj|psp
6 alw 0.00 adj|psp
6 tnt -14.87 adj|psp
6 adj 9.55 adj|psp # keep
6 psp -18.56 adj|psp # keep
6 wzi -17.41 adj|psp
6 nvm 65.97 adj|psp
6 psp 27.41 adj|psp # keep
89 alw -180.33 alw # keep
89 alw -869.44 alw # keep
89 tnt -95.66 alw
89 wzi -35.43 alw
89 nvm -144.90 alw
89 alw 180.23 alw # keep
89 tnt 35.43 alw
105 tnt -24.01 tnt # keep
105 wzi 213.13 tnt
105 ins -26.06 tnt
105 tnt 60.81 tnt # keep
I would like to filter this data frame so that I only have the rows for each account that correspond to the mask. For example, for student 6, I only want rows that have classes "adj" or "psp".
I tried to do .isin() but the list I am using constantly changes, so any thoughts on how to do this most efficiently?
In the end, I am looking to get the total summation of those classes and mark ones that are greater than zero. I'd like to turn the loop below into something more efficient if possible. Thanks!
bad_list = []
for i in test.index.unique():
df_ = test.loc[i]
mask_class = df_["mask"].str.split("|")
mask_class = mask_class.iloc[0]
df_mask = df_[df_['classes'].isin(mask_class)]
if(sum(df_mask["money"])>0):
bad_list.append(i)
```
Upvotes: 0
Views: 963
Reputation: 28644
You could create a temp column that has the mask column split, explode the column and filter for only rows that match classes with the temp column:
(df
#create temporary column
.assign(mask_expand = lambda x: x['mask'].str.split('|'))
# 'explode' column into individual elements per row
.explode('mask_expand')
#keep only rows where the values in classes are in mask_expand(the temporary column)
.query('classes == mask_expand')
#get rid of the temporary column
.drop('mask_expand', axis = 1)
)
student classes money mask
0 6 psp -26.23 adj|psp
5 6 adj 9.55 adj|psp
6 6 psp -18.56 adj|psp
9 6 psp 27.41 adj|psp
10 89 alw -180.33 alw
11 89 alw -869.44 alw
15 89 alw 180.23 alw
17 105 tnt -24.01 tnt
20 105 tnt 60.81 tnt
Upvotes: 1
Reputation: 4607
If your condition changes from student to student, you can all specifics in dictionary and apply with groupby
condition = {6:['adj','psp'],
89:['alw'],
105:['tnt']}
df1.groupby('student').apply(lambda x: x.loc[x['classes'].isin(condition[x['student'].iloc[0]])])
Out:
student classes money
student
6 0 6 psp -26.23
5 6 adj 9.55
6 6 psp -18.56
9 6 psp 27.41
89 10 89 alw -180.33
11 89 alw -869.44
15 89 alw 180.23
105 17 105 tnt -24.01
20 105 tnt 60.81
Upvotes: 0