holastello
holastello

Reputation: 623

Get difference between list in Pandas dataframe and external list

I want to get the difference between column IDs and list all_IDs, and write this result to a new column IDs_missing in my pandas dataframe. In my case, the lists always contain unique values.

>>> all_IDs = ['A','B','C','D','E','F']
>>> df = pd.DataFrame([{'IDs': ['B','C','F']}, {'IDs': ['A','B']}])

>>> df
     IDs
0  [B, C, F]
1     [A, B]

Expected output:

>>>df
      IDs         IDs_missing
0   [B, C, F]     [A, D, E]
1    [A, B]       [C, D, E, F]

Upvotes: 2

Views: 1228

Answers (2)

rafaelc
rafaelc

Reputation: 59274

Use set differencing (operator -) and take advantadge of broadcasting

set(all_IDs) - df.IDs.transform(set) 

0       (D, A, E)
1    (D, F, C, E)

Upvotes: 3

Inder
Inder

Reputation: 3816

one way can be:

emp=[]
for i in range(len(df)):
    emp.append([x for x in all_IDs if x not in df["IDs"][i]])

df["missing"]=emp 

This will give you a list of all the id's that are missing in the dataframe output will be like

    IDs         missing
0   [B, C, F]   [A, D, E]
1   [A, B]  [C, D, E, F]

Upvotes: 0

Related Questions