Reputation: 2182
I wish to read several log files as they are written and process their input with asyncio. The code will have to run on windows. From what I understand from searching around both stackoverflow and the web, asynchronous file I/O is tricky on most operating systems (select
will not work as intended, for example). While I'm sure I could do this with other methods (e.g. threads), I though I would try out asyncio to see what it is like. The most helpful answer would probably be one that describes what the "architecture" of a solution to this problem should look like, i.e. how different functions and coroutines should be called or scheduled.
The following gives me a generator that reads the files line by line (through polling, which is acceptable):
import time
def line_reader(f):
while True:
line = f.readline()
if not line:
time.sleep(POLL_INTERVAL)
continue
process_line(line)
With several files to monitor and process, this sort of code would require threads. I have modified it slightly to be more usable with asyncio:
import asyncio
def line_reader(f):
while True:
line = f.readline()
if not line:
yield from asyncio.sleep(POLL_INTERVAL)
continue
process_line(line)
This sort of works when I schedule it through the asyncio event loop, but if process_data
blocks, then that is of course not good. When starting out, I imagined the solution would look something like
def process_data():
...
while True:
...
line = yield from line_reader()
...
but I could not figure out how to make that work (at least not without process_data
managing quite a bit of state).
Any ideas on how I should structure this kind of code?
Upvotes: 52
Views: 72350
Reputation: 832
Anyio has great support for asynchronous file I/O. It even supports using the pathlib syntax. See: https://anyio.readthedocs.io/en/stable/fileio.html
from anyio import open_file, run
async def main():
async with await open_file('/some/path/somewhere') as f:
async for line in f:
print(line, end='')
run(main)
Upvotes: 6
Reputation: 8065
Using the aiofiles:
async with aiofiles.open('filename', mode='r') as f:
async for line in f:
print(line)
EDIT 1
As the @Jashandeep mentioned, you should care about blocking operations:
Another method is select
and or epoll
:
from select import select
files_to_read, files_to_write, exceptions = select([f1, f2], [f1, f2], [f1, f2], timeout=.1)
The timeout
parameter is important here.
see: https://docs.python.org/3/library/select.html#select.select
EDIT 2
You can register a file for read/write with: loop.add_reader()
It uses internal EPOLL Handler inside the loop.
EDIT 3
But remember the Epoll will not work with regular files.
Upvotes: 50
Reputation: 5173
From what I understand from searching around both stackoverflow and the web, asynchronous file I/O is tricky on most operating systems (select will not work as intended, for example). While I'm sure I could do this with other methods (e.g. threads), I though I would try out asyncio to see what it is like.
asyncio
is select
based on *nix systems under the hood, so you won't be able to do non-blocking file I/O without the use of threads. On Windows, asyncio
can use IOCP, which supports non-blocking file I/O, but this isn't supported by asyncio
.
Your code is fine, except you should do blocking I/O calls in threads, so that you don't block the event loop if the I/O is slow. Fortunately, it's really simple to off load work to threads using the loop.run_in_executor
function.
First, setup a dedicated thread-pool for your I/O:
from concurrent.futures import ThreadPoolExecutor
io_pool_exc = ThreadPoolExecutor()
And then simply offload any blocking I/O calls to the executor:
...
line = yield from loop.run_in_executor(io_pool_exc, f.readline)
...
Upvotes: 32
Reputation: 17376
asyncio
doesn't support file operations yet, sorry.
Thus it cannot help with your problem.
Upvotes: -3