Reputation: 169
The lists ( 2 o more) to compare with the DataFrame (column A) can change depending tha data of the column B
df = pd.DataFrame({'A': [['10', '20', '30', '40'],['50', '60', '70', '80'],['90','100','110','120'],'B': ['X','Y','Z']})
findA = ['10','25', '20','35']
findB = ['60','85]
.
.
.
findX
A B
0 [10, 20, 30, 40] X
1 [50, 60, 70, 80] Y
2 [90, 100, 110, 120] Z
This will be the desired behavior
If the data of column B is 'X' then compare findA list with column A and add the diference (findA) to the column C
If the data of column B is 'Y' then compare findB list with column A and add the diference (findB) to the column C
else NaN
A B C
0 [10, 20, 30, 40] X ['25','35']
1 [50, 60, 70, 80] Y ['85']
2 [90, 100, 110, 120] Z NaN
Thanks in advance
Upvotes: 0
Views: 33
Reputation: 862761
For improve performance is filtered rows and processing only matched values by masks, then is created new column by DataFrame.loc
with difference of sets:
m1 = df['B'].eq('X')
df.loc[m1, 'C'] = df.loc[m1, 'A'].apply(lambda x: list(set(findA) - set(x)))
m2 = df['B'].eq('Y')
df.loc[m2, 'C'] = df.loc[m2, 'A'].apply(lambda x: list(set(findB) - set(x)))
print (df)
A B C
0 [10, 20, 30, 40] X [35, 25]
1 [50, 60, 70, 80] Y [85]
2 [90, 100, 110, 120] Z NaN
More dynamic solution:
d = {'X': findA, 'Y': findB}
for k, v in d.items():
m = df['B'].eq(k)
df.loc[m, 'C'] = df.loc[m, 'A'].apply(lambda x: list(set(v) - set(x)))
print (df)
A B C
0 [10, 20, 30, 40] X [35, 25]
1 [50, 60, 70, 80] Y [85]
2 [90, 100, 110, 120] Z NaN
Solution with numpy.select
is possible, but here apply
processing all values, so if large data it should be slowier:
m1 = df['B'].eq('X')
m2 = df['B'].eq('Y')
a = df['A'].apply(lambda x: list(set(findA) - set(x)))
b = df['A'].apply(lambda x: list(set(findB) - set(x)))
df['C'] = np.select([m1, m2], [a, b], np.nan)
print (df)
A B C
0 [10, 20, 30, 40] X [35, 25]
1 [50, 60, 70, 80] Y [85]
2 [90, 100, 110, 120] Z NaN
Upvotes: 1
Reputation: 148965
You could first build a temporary dataframe to link the findx
lists to the values of column B
and merge it into df
:
tmp = pd.DataFrame({'B': ['X', 'Y', 'Z'], 'find': [findA, findB, findC]})
df['C'] = df.merge(tmp, on='B').apply(lambda x: list(set(x['find']) - set(x['A'])), axis=1)
With a reasonable findC
it gives:
A B C
0 [10, 20, 30, 40] X [35, 25]
1 [50, 60, 70, 80] Y [85]
2 [90, 100, 110, 120] Z []
If you really need a NaN instead of the empty list, just do:
df.loc[df['C'].apply(len) == 0, 'C'] = np.nan
Upvotes: 1