Reputation: 319
I have a dataframe with a column (colD
) created from keywords extracted from another column (colC
). I used a list that contains all the keywords ('abc', 'xyz', 'efg', 'rst') but if the keyword does not appear in colC
it does not get recorded in colD
. The keywords also might or might not exist in two other columns (colA
and colB
). I'm wondering if there is a way to append values if there are any from colA
and/or colB
to the respective list in colD
if they don't already exist in the list?
Current state:
colA colB colC colD
0 abc NaN hi there:abc [abc]
1 xyz NaN blahblahblah []
2 efg rst text rst text [rst]
Desired output:
colA colB colC colD
0 abc NaN hi there:abc [abc]
1 xyz NaN blahblahblah [xyz]
2 efg rst text rst text [rst, efg]
Upvotes: 1
Views: 73
Reputation: 323266
IIUC, first stack
with the columns you want to add to the list
, then groupby
the level
and get the list
s=df[['colA','colB']].stack().groupby(level=0).apply(list)
#here using the set get the different and adding the different back the colD
df.colD=[y+list(set(x)-set(y))for x , y in zip(s,df.colD)]
df
Out[118]:
colA colB colD
0 abc NaN [abc]
1 xyz NaN [xyz]
2 efg rst [rst, efg]
Upvotes: 2