Sonkar
Sonkar

Reputation: 25

Pandas If row value contains items from a list as substrings, add new colum with values present on substring

If row value contains text from a list as substrings, I would like to add a new column on my df with the values matching the list.

INPUT DATAFRAME:

1      ['a','b','d']
2      ['a','e','l']
3      ['a','b']
4      ['a','x','t','a','b','A']

This is my list:

list=['a','b','A']

output expected:

index  tech                         final_tech
1      ['a','b','d']                a, b
2      ['a','e','l']                a
3      ['a','b']                    a, b
4      ['a','x','t','a','b','A']    a, b, A

Any idea how to I can do that?

Thanks

Upvotes: 0

Views: 211

Answers (1)

Gonçalo Peres
Gonçalo Peres

Reputation: 13582

Given OP's dataframe

df = pd.DataFrame({'tech': [['a','b','d'],['a','e','l'],['a','b'],['a','x','t','a','b','A']]})

And the list

list=['a','b','A']

One can achieve OP's goal using pandas.DataFrame.apply and a lambda function as follows (set will make sure a specific string appears only once)

df['final_tech'] = df['tech'].apply(lambda x: ','.join(set([i for i in x if i in list])))

[Out]:
                 tech final_tech
0           [a, b, d]        a,b
1           [a, e, l]          a
2              [a, b]        a,b
3  [a, x, t, a, b, A]      A,a,b

If one doesn't care if the strings appear duplicated in final_tech, then simply use the following (for reference check the last row and compare to the previous output)

df['final_tech'] = df['tech'].apply(lambda x: ','.join([i for i in x if i in list]))

[Out]:

                 tech final_tech
0           [a, b, d]        a,b
1           [a, e, l]          a
2              [a, b]        a,b
3  [a, x, t, a, b, A]    a,a,b,A

Upvotes: 2

Related Questions