Reputation: 2189
I have a data frame like this,
col1 col2
[1,2] [3,4]
[5,6] [7,8]
[9,5] [1,3]
[8,4] [3,6]
and I have a function f which takes two list inputs and returns a single value. I want to add the column as col3 and apply the function with col1 and col2 values. the output of the function will be col3 values, so the final data frame would look like,
col1 col2 col3
[1,2] [3,4] 3
[5,6] [7,8] 5
[9,5] [1,3] 8
[8,4] [3,6] 9
Using a for loop and passing list values each time I can calculate the col3 values. but the execution time will be longer. Looking for pythonic way to do the task more efficiently.
Upvotes: 0
Views: 42
Reputation: 862771
Working with lists in pandas is not good vectorized, possible solution with list comprehension:
df['col3'] = [func(a, b) for a,b in zip(df.col1, df.col2)]
Pandas apply
solution (should be slowier):
df['col3'] = df.apply(lambda x: func(x.col1, x.col2), axis=1)
But if function should be vectorized and same length of list in columns maybe is possible rewrite it to numpy
.
If not, maybe rewritten function to numba
should help.
Performance with custom function:
#[40000 rows x 2 columns]
df = pd.concat([df] * 10000, ignore_index=True)
#sample function
def func(x, y):
return min(x + y)
In [144]: %timeit df['col31'] = [func(a, b) for a,b in zip(df.col1, df.col2)]
39.6 ms ± 331 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [145]: %timeit df['col32'] = df.apply(lambda x: func(x.col1, x.col2), axis=1)
2.25 s ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 1