Reputation: 11460
I would like to use apply with two columns and add additional arguments. My use case is to perform a search on a column and return the regex to another column without overwriting existing values in the other column. Maybe iterrows is a better option :).
import random
import re
import pandas as pd
import numpy as np
#create the dataframe
df = pd.DataFrame({
'a':np.random.choice( ['the_panda','it_python','my_shark'], 6),
})
df["b"] = ""
Yields:
a b
0 the_panda
1 my_shark
2 my_shark
3 the_panda
4 it_python
5 the_panda
Each time I apply my function if the value appears in column "a" then I want to write the search string to column "b". So if I used "panda" and then "shark" to search it would look like this:
a b
0 the_panda panda
1 my_shark shark
2 my_shark shark
3 the_panda panda
4 it_python
5 the_panda panda
I created a simple function:
def search_log(b,a,search_sting):
so = re.search(search_string,a)
if so:
return search_string
else:
return b
However I'm not sure if there is a way to add additional arguments to the apply function in this case? Here is what I'm trying:
search_string = 'panda'
df['b'] = df.apply(lambda x: search_log(x['b'],x['a']),args=(search_string,),axis=1)
Which yields:
TypeError: ('<lambda>() takes 1 positional argument but 2 were given', 'occurred at index 0')
...or
df['b'] = df.apply(lambda x: search_log(x['b'],x['a'],args=(search_string,),axis=1))
which yields:
KeyError: ('b', 'occurred at index a')
Upvotes: 1
Views: 130
Reputation: 101
string = ["panda","shark","python"]
df["b"] = df["a"].apply(lambda y:[x for x in string if x in y][0] if len([x for x in string if x in y])==1 else "")
Output:
a b
0 it_python
1 my_shark
2 my_shark
3 the_panda
4 my_shark
5 my_shark
a b
0 it_python python
1 my_shark shark
2 my_shark shark
3 the_panda panda
4 my_shark shark
5 my_shark shark
Upvotes: 1