yololo
yololo

Reputation: 319

Pandas - appending values from one column to list in new column if values do not already exist in the list of the new column

I have a dataframe with a column (colD) created from keywords extracted from another column (colC). I used a list that contains all the keywords ('abc', 'xyz', 'efg', 'rst') but if the keyword does not appear in colC it does not get recorded in colD. The keywords also might or might not exist in two other columns (colA and colB). I'm wondering if there is a way to append values if there are any from colA and/or colB to the respective list in colD if they don't already exist in the list?

Current state:

  colA colB           colC   colD
0  abc  NaN   hi there:abc  [abc]
1  xyz  NaN   blahblahblah     []
2  efg  rst  text rst text  [rst]

Desired output:

   colA colB          colC        colD
0   abc  NaN  hi there:abc       [abc]
1   xyz  NaN  blahblahblah       [xyz]
2   efg  rst text rst text  [rst, efg]

Upvotes: 1

Views: 73

Answers (1)

BENY
BENY

Reputation: 323266

IIUC, first stack with the columns you want to add to the list , then groupby the level and get the list

s=df[['colA','colB']].stack().groupby(level=0).apply(list)
#here using the set get the different and adding the different back the colD
df.colD=[y+list(set(x)-set(y))for x , y in zip(s,df.colD)]
df
Out[118]: 
  colA colB        colD
0  abc  NaN       [abc]
1  xyz  NaN       [xyz]
2  efg  rst  [rst, efg]

Upvotes: 2

Related Questions