Reputation: 394

Query dataframe columns using multiple variables/inputs

I've encountered a situation where I need to filter a dataframe via input(s) that are found in columns P1-P5 below. There could be anywhere from 1 to 5 inputs and they could be located in any of P1-P5.

   TeamAbb    P1       P2      P3      P4      P5
0     ATL1  203953  1627745 1629027 1629629 1629631
1     ATL2  203953  1627745 1627761 1629027 1629631
2     ATL3  203458  203953  1627761 1629027 1629631
3     ATL4  203458  203953  1629027 1629629 1629631
4     ATL5  203458  1628381 1629027 1629629 1629631
5     ATL6  203953  1628981 1628989 1629027 1629631
6     ATL7  203953  1627745 1628989 1629027 1629631
7     ATL8  1713    202323  203459  1627761 1628981
8     ATL9  1713    203459  1628981 1629027 1629631

Example 1

input_val = [1713]

   TeamAbb    P1       P2      P3      P4      P5
7     ATL8  1713    202323  203459  1627761 1628981
8     ATL9  1713    203459  1628981 1629027 1629631

Example 2

input_val = [1713,202323]

   TeamAbb    P1       P2      P3      P4      P5
7     ATL8  1713    202323  203459  1627761 1628981

So far each method I've tried hasn't worked (query, apply/any and mask). If anyone has ideas on how to approach this I'd really apreciate it.

Upvotes: 1

Answers (3)

oppressionslayer

Reputation: 7224

I think this acheives the result your looking for:

ee = df
ee = e.isin([1713]) 
ee ['match'] = ee[ee>0].count(axis=1) 
df.loc[ee['match']==ee['match'].max()]

output:

  TeamAbb    P1      P2       P3       P4       P5
7    ATL8  1713  202323   203459  1627761  1628981
8    ATL9  1713  203459  1628981  1629027  1629631

Input:

ee = df
ee = e.isin([1713, 202323]) 
ee ['match'] = ee[ee>0].count(axis=1) 
df.loc[ee['match']==ee['match'].max()]

output:

  TeamAbb    P1      P2      P3       P4       P5
7    ATL8  1713  202323  203459  1627761  1628981

input

ee = df 
ee = e.isin([203953,1628989])  
ee ['match'] = ee[ee>0].count(axis=1)  
df.loc[ee['match']==ee['match'].max()]

output

  TeamAbb      P1       P2       P3       P4       P5
5    ATL6  203953  1628981  1628989  1629027  1629631
6    ATL7  203953  1627745  1628989  1629027  1629631

Upvotes: 0

Brandon

Reputation: 1018

I would use an apply across the rows and check the difference of the input and row sets:

input_val = [1713,202323]
df_filter = (
             df[['P{}'.format(i) for i in range(1,6)]]
             .apply(lambda row: len(set(input_val) - set(row)) == 0 # check that all input vals are found somewhere in the row
                    , axis=1)
            ) 
df_new = df[df_filter] # apply the filter

Upvotes: 0

Ben.T

Reputation: 29635

You can sum all the boolean dataframes equal (eq) to each value in your list and then sum the resulting summed dataframe over the axis=1 and then check if it is the same value as the length of your input list:

input_val = [1713,202323]
mask = sum([df.eq(i) for i in input_val]).sum(1).eq(len(input_val))

print (df[mask])
  TeamAbb    P1      P2      P3       P4       P5
7    ATL8  1713  202323  203459  1627761  1628981

Upvotes: 1

Query dataframe columns using multiple variables/inputs

Answers (3)

Related Questions