Reputation: 139
My program would concurrently download about 10 million pieces of data with aiohttp and then write the data to about 4000 files on disk.
I use the aiofiles library because I want my program to also do other stuff when it is reading/writing a file.
But I worry that if the program try to write to all of the 4000 files at the same time, the hard disk can't do all the writes that quickly.
Is it possible to limit the number of concurrent writes with aiofiles (or other library)? Does aiofiles already do this?
Thanks.
test code:
import aiofiles
import asyncio
async def write_to_disk(fname):
async with aiofiles.open(fname, "w+") as f:
await f.write("asdf")
async def main():
tasks = [asyncio.create_task(write_to_disk("%d.txt" % i))
for i in range(10)]
await asyncio.gather(*tasks)
asyncio.run(main())
Upvotes: 1
Views: 2099
Reputation: 1390
You can use asyncio.Semaphore
to limit the number of concurrent tasks. Simply force your write_to_disk
function to acquire the semaphore before writing:
import aiofiles
import asyncio
async def write_to_disk(fname, sema):
# Edit to address comment: acquire semaphore after opening file
async with aiofiles.open(fname, "w+") as f, sema:
print("Writing", fname)
await f.write("asdf")
print("Done writing", fname)
async def main():
sema = asyncio.Semaphore(3) # Allow 3 concurrent writers
tasks = [asyncio.create_task(write_to_disk("%d.txt" % i, sema)) for i in range(10)]
await asyncio.gather(*tasks)
asyncio.run(main())
Note both the sema = asyncio.Semaphore(3)
line as well as the addition of sema,
in the async with
.
Output:
"""
Writing 1.txt
Writing 0.txt
Writing 2.txt
Done writing 1.txt
Done writing 0.txt
Done writing 2.txt
Writing 3.txt
Writing 4.txt
Writing 5.txt
Done writing 3.txt
Done writing 4.txt
Done writing 5.txt
Writing 6.txt
Writing 7.txt
Writing 8.txt
Done writing 6.txt
Done writing 7.txt
Done writing 8.txt
Writing 9.txt
Done writing 9.txt
"""
Upvotes: 2