Reputation: 693
I keep running into dead ends here, and it's killing me.
Dataframe:
accountid col2 col3
1 ['abc','def','xyz'] ['abc','mda','xyz','sdi']
2 ['abc','asd','xyz','dib] ['nio','ouy','abc']
3 ['abc','def','xyz'] ['abc','mda','xyz']
Notes
*each field in col2 and col3 are lists
*fields in col2 and col3 may not have an equal number of items in the list
Result should look like I'm trying to create a col4 that shows the items in col3 that are not in and col2:
accountid col2 col3 col4
1 ['abc','def','xyz'] ['abc','mda','xyz','sdi'] ['mda','sdi']
2 ['abc','asd','xyz','dib] ['nio','ouy','abc'] ['nio','ouy']
3 ['abc','def','xyz'] ['abc','mda','xyz'] ['mda']
What can I try next?
Upvotes: 0
Views: 1482
Reputation: 323326
Let us do
s=df.col3.apply(set)-df.col2.apply(set)
0 {sdi, mda}
1 {nio, ouy}
2 {mda}
dtype: object
df['New']=s.map(list)
Check the result
s.map(list)
0 [sdi, mda]
1 [nio, ouy]
2 [mda]
dtype: object
You list is not list , it is string
import ast
df.iloc[:,1:]=df.iloc[:,1:].applymap(ast.literal_eval)
Upvotes: 3
Reputation: 1420
Try this. Apply the lambda function along the column axis=1
df['col4'] = df.apply(lambda x : list(set(x['col3']).difference(set(x['col2']))), axis=1)
Upvotes: 1