Reputation: 623
I want to get the difference between column IDs
and list all_IDs
, and write this result to a new column IDs_missing
in my pandas dataframe. In my case, the lists always contain unique values.
>>> all_IDs = ['A','B','C','D','E','F']
>>> df = pd.DataFrame([{'IDs': ['B','C','F']}, {'IDs': ['A','B']}])
>>> df
IDs
0 [B, C, F]
1 [A, B]
Expected output:
>>>df
IDs IDs_missing
0 [B, C, F] [A, D, E]
1 [A, B] [C, D, E, F]
Upvotes: 2
Views: 1228
Reputation: 59274
Use set differencing (operator -
) and take advantadge of broadcasting
set(all_IDs) - df.IDs.transform(set)
0 (D, A, E)
1 (D, F, C, E)
Upvotes: 3
Reputation: 3816
one way can be:
emp=[]
for i in range(len(df)):
emp.append([x for x in all_IDs if x not in df["IDs"][i]])
df["missing"]=emp
This will give you a list of all the id's that are missing in the dataframe output will be like
IDs missing
0 [B, C, F] [A, D, E]
1 [A, B] [C, D, E, F]
Upvotes: 0