Reputation: 1705
There is an apply
method in pandas dataframe that allows to apply some sync functions like:
import numpy as np
import pandas as pd
def fun(x):
return x * 2
df = pd.DataFrame(np.arange(10), columns=['old'])
df['new'] = df['old'].apply(fun)
What is the fastest way to do similar thing if there is an async function fun2
that has to be applied:
import asyncio
import numpy as np
import pandas as pd
async def fun2(x):
return x * 2
async def main():
df = pd.DataFrame(np.arange(10), columns=['old'])
df['new'] = 0
for i in range(len(df)):
df['new'].iloc[i] = await fun2(df['old'].iloc[i])
print(df)
asyncio.run(main())
Upvotes: 16
Views: 14504
Reputation: 35626
Use asyncio.gather
and overwrite the whole column when complete.
import asyncio
import numpy as np
import pandas as pd
async def fun2(x):
return x * 2
async def main():
df = pd.DataFrame(np.arange(10), columns=['old'])
df['new'] = await asyncio.gather(*(fun2(v) for v in df['old']))
print(df)
asyncio.run(main())
Doing it this way will pass each value in the column to the async function, meaning that all column values will be being run concurrently (which will be much faster than awaiting each function result sequentially in a loop).
Note: Column order is guaranteed to be preserved by asyncio.gather
and the column will not be resolved until all awaitables have successfully completed.
Resulting output DataFrame:
old new
0 0 0
1 1 2
2 2 4
3 3 6
4 4 8
5 5 10
6 6 12
7 7 14
8 8 16
9 9 18
Upvotes: 22