Reputation: 31
given: a list of elements 'ls' and a big df 'df', all the elements of 'ls' is in the 'df'.
ls = ['a0','a1','a2','b0','b2','c0',...,'c_k']
df = [['a0','b0','c0'],
['a0','b0','c1'],
['a0','b0','c2'],
...
['a_i','b_j','c_k']]
goal: I want to collect the rows set of the 'df' that contains the most elements of 'ls', such as ['a0','b0','c0'] is the best one. But at most a row just contain only 2 elements
tried: I tried enumerating 3 or 2 elements in 'ls', but it was too expensive and probably return None since there exist only 2 elements in some row. I tried to use a dictionary to count, but it didn't work either.
I've been puzzling over this problem all day, any help will be greatly appreciated.
Upvotes: 0
Views: 30
Reputation: 3936
I would go like this:
row_id = df.apply(lambda x: x.isin(ls).sum(), axis=1)
This will give you the row index with max entries in the list. The desired row can be obtained so:
df.iloc[row_id, :]
Upvotes: 1