Reputation: 394
I've encountered a situation where I need to filter a dataframe via input(s) that are found in columns P1-P5 below. There could be anywhere from 1 to 5 inputs and they could be located in any of P1-P5.
TeamAbb P1 P2 P3 P4 P5
0 ATL1 203953 1627745 1629027 1629629 1629631
1 ATL2 203953 1627745 1627761 1629027 1629631
2 ATL3 203458 203953 1627761 1629027 1629631
3 ATL4 203458 203953 1629027 1629629 1629631
4 ATL5 203458 1628381 1629027 1629629 1629631
5 ATL6 203953 1628981 1628989 1629027 1629631
6 ATL7 203953 1627745 1628989 1629027 1629631
7 ATL8 1713 202323 203459 1627761 1628981
8 ATL9 1713 203459 1628981 1629027 1629631
Example 1
input_val = [1713]
TeamAbb P1 P2 P3 P4 P5
7 ATL8 1713 202323 203459 1627761 1628981
8 ATL9 1713 203459 1628981 1629027 1629631
Example 2
input_val = [1713,202323]
TeamAbb P1 P2 P3 P4 P5
7 ATL8 1713 202323 203459 1627761 1628981
So far each method I've tried hasn't worked (query, apply/any and mask). If anyone has ideas on how to approach this I'd really apreciate it.
Upvotes: 1
Views: 47
Reputation: 7224
I think this acheives the result your looking for:
ee = df
ee = e.isin([1713])
ee ['match'] = ee[ee>0].count(axis=1)
df.loc[ee['match']==ee['match'].max()]
output:
TeamAbb P1 P2 P3 P4 P5
7 ATL8 1713 202323 203459 1627761 1628981
8 ATL9 1713 203459 1628981 1629027 1629631
Input:
ee = df
ee = e.isin([1713, 202323])
ee ['match'] = ee[ee>0].count(axis=1)
df.loc[ee['match']==ee['match'].max()]
output:
TeamAbb P1 P2 P3 P4 P5
7 ATL8 1713 202323 203459 1627761 1628981
input
ee = df
ee = e.isin([203953,1628989])
ee ['match'] = ee[ee>0].count(axis=1)
df.loc[ee['match']==ee['match'].max()]
output
TeamAbb P1 P2 P3 P4 P5
5 ATL6 203953 1628981 1628989 1629027 1629631
6 ATL7 203953 1627745 1628989 1629027 1629631
Upvotes: 0
Reputation: 1018
I would use an apply across the rows and check the difference of the input and row sets:
input_val = [1713,202323]
df_filter = (
df[['P{}'.format(i) for i in range(1,6)]]
.apply(lambda row: len(set(input_val) - set(row)) == 0 # check that all input vals are found somewhere in the row
, axis=1)
)
df_new = df[df_filter] # apply the filter
Upvotes: 0
Reputation: 29635
You can sum
all the boolean dataframes equal (eq
) to each value in your list and then sum
the resulting summed dataframe over the axis=1
and then check if it is the same value as the length of your input list:
input_val = [1713,202323]
mask = sum([df.eq(i) for i in input_val]).sum(1).eq(len(input_val))
print (df[mask])
TeamAbb P1 P2 P3 P4 P5
7 ATL8 1713 202323 203459 1627761 1628981
Upvotes: 1