Robert
Robert

Reputation: 159

Pandas Series.apply - use arguments from another Series?

I have the following statement:

>>> df['result'] = df['value'].apply(myfunc, args=(x,y,z))

The Python function myfunc was written before I started using Pandas and is set up to take single values. The arguments x and z are fixed and can easily be passed as a variable or literal, but I have a column in my DataFrame that represents the y parameter, so I'm looking for a way to use that row's value for each row (they differ from row to row).

i.e. df['y'] is a series of values that I'd like to send in to myfunc

My workaround is as follows:

values = list(df['value'])
y = list(df['y'])
df['result'] = pd.Series([myfunc(values[i],x,y[i],z) for i in range(0,len(values))])

Any better approaches?

EDIT

Using functools.partial has a gotcha that was able to work out. If your call does not stick to keyword arguments then it appears to resort to positional and then you may run into the 'myfunc() got multiple values for...' error.

I modified the answer from coldspeed:

# Function myfunc takes named arguments arg1, arg2, arg3 and arg4
#   The values for arg2 and arg4 don't change so I'll set them when
#   defining the partial (assume x and z have values set)
myfunc_p = partial(myfunc, arg2=x, arg4=z)
df['result'] = [myfunc_p(arg1=w, arg3=y) for w, y in zip(df['value'], df['y'])]

Upvotes: 1

Views: 930

Answers (2)

rer
rer

Reputation: 1268

You could also apply over the rows with a lambda like so:

df['result'] = df.apply(lambda row: myfunc(row['value'], y=row['y'], x=x, z=z), axis=1)

Upvotes: 3

cs95
cs95

Reputation: 402593

I think what you're doing is fine. I'd maybe make a couple of improvements:

from functools import partial
myfunc_p = partial(myfunc, x=x, z=z)
df['result'] = [myfunc_p(v, y) for v, y in zip(df['value'], df['y'])]

You don't need to wrap the list in a pd.Series call, and you can clean up your function call by fixing two of the arguments with functools.partial.

There's also the other option using np.vectorize (disclaimer, this does not actually vectorize the function, just hides the loop) for more concise code, but in most cases the list comprehension should be faster.

myfunc_v = np.vectorize(partial(myfunc, x=x, z=z))
df['result'] = myfunc_v(df['value'], df['y'])

Upvotes: 1

Related Questions