Reputation: 21
def func(row):
if row.GT_x == row.GT_y or row.GT_x == row.GT_y[::-1]:
return 2
elif len(set(row.GT_x) & set(row.GT_y)) != 0:
return 1
else:
return 0
%%timeit
merged_df['Decision'] = merged_df.apply(func, axis=1)
1 loop, best of 3: 30.2 s per loop
I'm going to apply "func" for all dataframe rows and the number of row is approximately 650,000.
I guess pandas.apply() takes more time than iterating by for loop.
I also tried lambda function rather than "func", but the result is same.
my dataframe has two columns named GT_x, GT_y and, it has "AA" or "BB". Function "func" detect GT_x and GT_y is same, it return 2, if one of them matches, return 1, else return 0.
And, I'm gonna make another column(Decision) by using apply function "func"
Could you recommend another faster method?
+
Here's sample data I have
GT_x GT_y
0 AG GA
1 AA GA
2 AA GG
3 GG GG
...
65000 GG GG
index 0 result should be 2, index 1 result should be 1, index 2 result should be 0, also index 3 and 65,000 result should be 2
Upvotes: 2
Views: 2715
Reputation: 1028
you can use df.apply(func, axis=1, raw=True) for faster computations (in that case input of your function will be raw numpy array instead of series)
from apply function description:
raw : boolean, default False
If False, convert each row or column into a Series. If raw=True the
passed function will receive ndarray objects instead. If you are just a
applying a NumPy reduction function this will achieve much better
performance
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html
Upvotes: 1