Anna Whelan
Anna Whelan

Reputation: 1

Is there a way to use pandas .isin() function with multiple lists?

    classes money    mask
student         
6   psp   -26.23    adj|psp  # keep
6   tnt   0.00      adj|psp
6   nvm   -87.42    adj|psp
6   alw   0.00      adj|psp
6   tnt   -14.87    adj|psp
6   adj   9.55      adj|psp  # keep
6   psp   -18.56    adj|psp  # keep
6   wzi   -17.41    adj|psp
6   nvm   65.97     adj|psp
6   psp   27.41     adj|psp  # keep
89  alw   -180.33   alw      # keep
89  alw   -869.44   alw      # keep
89  tnt   -95.66    alw
89  wzi   -35.43    alw
89  nvm   -144.90   alw
89  alw   180.23    alw      # keep
89  tnt   35.43     alw
105 tnt   -24.01    tnt      # keep
105 wzi   213.13    tnt
105 ins   -26.06    tnt
105 tnt   60.81     tnt      # keep

I would like to filter this data frame so that I only have the rows for each account that correspond to the mask. For example, for student 6, I only want rows that have classes "adj" or "psp".

I tried to do .isin() but the list I am using constantly changes, so any thoughts on how to do this most efficiently?

In the end, I am looking to get the total summation of those classes and mark ones that are greater than zero. I'd like to turn the loop below into something more efficient if possible. Thanks!

bad_list = []

for i in test.index.unique():
    df_ = test.loc[i]
    mask_class = df_["mask"].str.split("|")
    mask_class = mask_class.iloc[0]
    df_mask = df_[df_['classes'].isin(mask_class)]
    if(sum(df_mask["money"])>0):
        bad_list.append(i)
    ```

Upvotes: 0

Views: 963

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

You could create a temp column that has the mask column split, explode the column and filter for only rows that match classes with the temp column:

(df
  #create temporary column
 .assign(mask_expand = lambda x: x['mask'].str.split('|'))
  # 'explode' column into individual elements per row
 .explode('mask_expand')
  #keep only rows where the values in classes are in mask_expand(the temporary column)
 .query('classes == mask_expand')
  #get rid of the temporary column
 .drop('mask_expand', axis = 1)
 )

    student classes money   mask
 0     6      psp   -26.23  adj|psp
 5     6      adj   9.55    adj|psp
 6     6      psp   -18.56  adj|psp
 9     6      psp   27.41   adj|psp
 10    89     alw   -180.33 alw
 11    89     alw   -869.44 alw
 15    89     alw   180.23  alw
 17    105    tnt   -24.01  tnt
 20    105    tnt   60.81   tnt

Upvotes: 1

Naga kiran
Naga kiran

Reputation: 4607

If your condition changes from student to student, you can all specifics in dictionary and apply with groupby

condition = {6:['adj','psp'],
         89:['alw'],
        105:['tnt']}

df1.groupby('student').apply(lambda x: x.loc[x['classes'].isin(condition[x['student'].iloc[0]])])

Out:

        student classes money
student             
6   0   6   psp -26.23
    5   6   adj 9.55
    6   6   psp -18.56
    9   6   psp 27.41
89  10  89  alw -180.33
    11  89  alw -869.44
    15  89  alw 180.23
105 17  105 tnt -24.01
    20  105 tnt 60.81

Upvotes: 0

Related Questions