foxpal
foxpal

Reputation: 623

Apply function from column to other column, efficiently

Suppose I have a DataFrame that looks like this:

import pandas as pd

df = pd.DataFrame({'x': [1,2,3], 'f': [lambda x: x + 1,
                                       lambda x: x ** 2, 
                                       lambda x: x / 5]})

I'd like to apply 'f' to each 'x' into a new column 'y'. The way I do it now is using apply, but this is a bit slow. Is there a better way? Is storing lambdas in DataFrames a bad idea?

df['y'] = df.apply(lambda row: row['f'](row['x']), axis=1)

Upvotes: 3

Views: 52

Answers (1)

jezrael
jezrael

Reputation: 863761

Is storing lambdas in DataFrames a bad idea?

I think yes, because pandas working efficient with scalars only.


If use loop in list comprehension, it is faster:

df = pd.DataFrame({'x': [1,2,3], 'f': [lambda x: x + 1,
                                       lambda x: x ** 2, 
                                       lambda x: x / 5]})

#3k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [97]: %timeit df['y'] = df.apply(lambda row: row['f'](row['x']), axis=1)
104 ms ± 3.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [98]: %timeit df['y1'] = [f(x) for f, x in zip(df['f'], df['x'])]
3 ms ± 93 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#300k
df = pd.concat([df] * 100000, ignore_index=True)
In [102]: %timeit df['y'] = df.apply(lambda row: row['f'](row['x']), axis=1)
10.3 s ± 315 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [103]: %timeit df['y1'] = [f(x) for f, x in zip(df['f'], df['x'])]
318 ms ± 4.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 2

Related Questions