Get difference between list in Pandas dataframe and external list

Question

I want to get the difference between column IDs and list all_IDs, and write this result to a new column IDs_missing in my pandas dataframe. In my case, the lists always contain unique values.

>>> all_IDs = ['A','B','C','D','E','F']
>>> df = pd.DataFrame([{'IDs': ['B','C','F']}, {'IDs': ['A','B']}])

>>> df
     IDs
0  [B, C, F]
1     [A, B]

Expected output:

>>>df
      IDs         IDs_missing
0   [B, C, F]     [A, D, E]
1    [A, B]       [C, D, E, F]

rafaelc · Accepted Answer

Use set differencing (operator -) and take advantadge of broadcasting

set(all_IDs) - df.IDs.transform(set) 

0       (D, A, E)
1    (D, F, C, E)

Get difference between list in Pandas dataframe and external list

Answers (2)

Related Questions