Reputation: 611
I have a dataframe
df_in = pd.DataFrame([[1,"A",32,">30"],[2,"B",12,"<10"],[3,"C",45,">=45"]],columns=['id', 'input', 'val', 'cond'])
I want to perform an operation on column "val" based on the condition present in "cond" column and get the True/False result in "Output" column.
Expected Output:
df_out = pd.DataFrame([[1,"A",32,">30",True],[2,"B",12,"<10",False],[3,"C",45,">=45",True]],columns=['id', 'input', 'val', 'cond',"Output"])
How to do it?
Upvotes: 0
Views: 377
Reputation: 24304
you can try:
df_in['output']=pd.eval(df_in['val'].astype(str)+df_in['cond'])
OR
If needed performance use the below method but also see this thread but I think in your case it is safe to use eval
:
df_in['output']=list(map(lambda x:eval(x),(df_in['val'].astype(str)+df_in['cond']).tolist()))
OR
Even more efficient and fastest:
from numpy.core import defchararray
df_in['output']=list(map(lambda x:eval(x),defchararray.add(df_in['val'].values.astype(str),df_in['cond'].values)))
output of df_in
:
id input val cond output
0 1 A 32 >30 True
1 2 B 12 <10 False
2 3 C 45 >=45 True
Time Comparison: using %%timeit -n 1000
Upvotes: 2
Reputation: 13349
Using numexpr
import numexpr
df_in['output'] = df_in.apply(lambda x: numexpr.evaluate(f"{x['val']}{x['cond']}"), axis=1 )
id input val cond output
0 1 A 32 >30 True
1 2 B 12 <10 False
2 3 C 45 >=45 True
Time Comparison:
using %%timeit -n 1000
using apply
and numexpr
:
865 µs ± 140 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
using pd.eval
:
2.5 ms ± 363 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 1