qeaneH
qeaneH

Reputation: 31

Advanced condition lookup in pandas(numpy)

given: a list of elements 'ls' and a big df 'df', all the elements of 'ls' is in the 'df'.

ls = ['a0','a1','a2','b0','b2','c0',...,'c_k']
df = [['a0','b0','c0'],
      ['a0','b0','c1'],
      ['a0','b0','c2'],
      ...
      ['a_i','b_j','c_k']]

goal: I want to collect the rows set of the 'df' that contains the most elements of 'ls', such as ['a0','b0','c0'] is the best one. But at most a row just contain only 2 elements

tried: I tried enumerating 3 or 2 elements in 'ls', but it was too expensive and probably return None since there exist only 2 elements in some row. I tried to use a dictionary to count, but it didn't work either.

I've been puzzling over this problem all day, any help will be greatly appreciated.

Upvotes: 0

Views: 30

Answers (1)

quest
quest

Reputation: 3936

I would go like this:

row_id = df.apply(lambda x: x.isin(ls).sum(), axis=1)

This will give you the row index with max entries in the list. The desired row can be obtained so:

df.iloc[row_id, :]

Upvotes: 1

Related Questions