Reputation: 25
If row value contains text from a list as substrings, I would like to add a new column on my df with the values matching the list.
INPUT DATAFRAME:
1 ['a','b','d']
2 ['a','e','l']
3 ['a','b']
4 ['a','x','t','a','b','A']
This is my list:
list=['a','b','A']
output expected:
index tech final_tech
1 ['a','b','d'] a, b
2 ['a','e','l'] a
3 ['a','b'] a, b
4 ['a','x','t','a','b','A'] a, b, A
Any idea how to I can do that?
Thanks
Upvotes: 0
Views: 211
Reputation: 13582
Given OP's dataframe
df = pd.DataFrame({'tech': [['a','b','d'],['a','e','l'],['a','b'],['a','x','t','a','b','A']]})
And the list
list=['a','b','A']
One can achieve OP's goal using pandas.DataFrame.apply
and a lambda function as follows (set
will make sure a specific string appears only once)
df['final_tech'] = df['tech'].apply(lambda x: ','.join(set([i for i in x if i in list])))
[Out]:
tech final_tech
0 [a, b, d] a,b
1 [a, e, l] a
2 [a, b] a,b
3 [a, x, t, a, b, A] A,a,b
If one doesn't care if the strings appear duplicated in final_tech
, then simply use the following (for reference check the last row and compare to the previous output)
df['final_tech'] = df['tech'].apply(lambda x: ','.join([i for i in x if i in list]))
[Out]:
tech final_tech
0 [a, b, d] a,b
1 [a, e, l] a
2 [a, b] a,b
3 [a, x, t, a, b, A] a,a,b,A
Upvotes: 2