Python: Efficiently check if value in a list is in another list

Question

I have a dataframe, user_df, with ~500,000 rows with the following format:

|  id  |  other_ids   |
|------|--------------|
|  1   |['abc', efg'] |
|  2   |['bbb']       |
|  3   |['ccc', 'ddd']|

I also have a list, other_ids_that_clicked, with ~5000 items full of other ids:

 ['abc', 'efg', 'ccc']

I'm looking to de-dupe other_ids_that_clicked using user_df by adding another column in df for when a value in other_ids is in user_df['other_ids'] as such:

|  id  |  other_ids   |  clicked  |
|------|--------------|-----------|
|  1   |['abc', efg'] |     1     |
|  2   |['bbb']       |     0     |
|  3   |['ccc', 'ddd']|     1     |

The way I'm checking is by looping through other_ids_that_clicked for each row in user_df.

def otheridInList(row):
  isin = False
  for other_id in other_ids_that_clicked:
    if other_id in row['other_ids']:
        isin = True
        break
    else: 
        isin = False
  if isin:
    return 1
  else:
    return 0

This is taking forever, so I was looking for suggestions on best ways to approach this.

Thanks!

cs95 · Accepted Answer

You can actually speed this up quite a bit. Take out the column, convert it into its own dataframe, and use df.isin to do some checking -

l = ['abc', 'efg', 'ccc']
df['clicked'] = pd.DataFrame(df.other_ids.tolist()).isin(l).any(1).astype(int)

   id   other_ids  clicked
0   1  [abc, efg]        1
1   2       [bbb]        0
2   3  [ccc, ddd]        1

Details

First, convert other_ids into a list of lists -

i = df.other_ids.tolist()

i
[['abc', 'efg'], ['bbb'], ['ccc', 'ddd']]

Now, load it into a new dataframe -

j = pd.DataFrame(i)

j
     0     1
0  abc   efg
1  bbb  None
2  ccc   ddd

Perform checks with isin -

k = j.isin(l)

k
       0      1
0   True   True
1  False  False
2   True  False

clicked can be computed by checking if True is present in any row, with df.any. The result is converted into an integer.

k.any(1).astype(int)

0    1
1    0
2    1
dtype: int64

Python: Efficiently check if value in a list is in another list

Answers (2)

Related Questions