Reputation: 65
I'm writing a spider to crawl web pages. I know asyncio maybe my best choice. So I use coroutines to process the work asynchronously. Now I scratch my head about how to quit the program by keyboard interrupt. The program could shut down well after all the works have been done. The source code could be run in python 3.5 and is attatched below.
import asyncio
import aiohttp
from contextlib import suppress
class Spider(object):
def __init__(self):
self.max_tasks = 2
self.task_queue = asyncio.Queue(self.max_tasks)
self.loop = asyncio.get_event_loop()
self.counter = 1
def close(self):
for w in self.workers:
w.cancel()
async def fetch(self, url):
try:
async with aiohttp.ClientSession(loop = self.loop) as self.session:
with aiohttp.Timeout(30, loop = self.session.loop):
async with self.session.get(url) as resp:
print('get response from url: %s' % url)
except:
pass
finally:
pass
async def work(self):
while True:
url = await self.task_queue.get()
await self.fetch(url)
self.task_queue.task_done()
def assign_work(self):
print('[*]assigning work...')
url = 'https://www.python.org/'
if self.counter > 10:
return 'done'
for _ in range(self.max_tasks):
self.counter += 1
self.task_queue.put_nowait(url)
async def crawl(self):
self.workers = [self.loop.create_task(self.work()) for _ in range(self.max_tasks)]
while True:
if self.assign_work() == 'done':
break
await self.task_queue.join()
self.close()
def main():
loop = asyncio.get_event_loop()
spider = Spider()
try:
loop.run_until_complete(spider.crawl())
except KeyboardInterrupt:
print ('Interrupt from keyboard')
spider.close()
pending = asyncio.Task.all_tasks()
for w in pending:
w.cancel()
with suppress(asyncio.CancelledError):
loop.run_until_complete(w)
finally:
loop.stop()
loop.run_forever()
loop.close()
if __name__ == '__main__':
main()
But if I press 'Ctrl+C' while it's running, some strange errors may occur. I mean sometimes the program could be shut down by 'Ctrl+C' gracefully. No error message. However, in some cases the program will be still running after pressing 'Ctrl+C' and wouldn't stop until all the works have been done. If I press 'Ctrl+C' at that moment, 'Task was destroyed but it is pending!' would be there.
I have read some topics about asyncio and add some code in main() to close coroutines gracefully. But it not work. Is someone else has the similar problems?
Upvotes: 5
Views: 2705
Reputation: 39546
I bet problem happens here:
except:
pass
You should never do such thing. And your situation is one more example of what can happen otherwise.
When you cancel task and await for its cancellation, asyncio.CancelledError
raised inside task and shouldn't be suppressed anywhere inside. Line where you await of your task cancellation should raise this exception, otherwise task will continue execution.
That's why you do
task.cancel()
with suppress(asyncio.CancelledError):
loop.run_until_complete(task) # this line should raise CancelledError,
# otherwise task will continue
to actually cancel task.
Upd:
But I still hardly understand why the original code could quit well by 'Ctrl+C' at a uncertain probability?
It dependence of state of your tasks:
CancelledError
on awaiting and your code will finished normally.Upvotes: 3
Reputation: 59436
I assume you are using any flavor of Unix; if this is not the case, my comments might not apply to your situation.
Pressing Ctrl-C in a terminal sends all processes associated with this tty the signal SIGINT
. A Python process catches this Unix signal and translates this into throwing a KeyboardInterrupt
exception. In a threaded application (I'm not sure if the async
stuff internally is using threads, but it very much sounds like it does) typically only one thread (the main thread) receives this signal and thus reacts in this fashion. If it is not prepared especially for this situation, it will terminate due to the exception.
Then the threading administration will wait for the still running fellow threads to terminate before the Unix process as a whole terminates with an exit code. This can take quite a long time. See this question about killing fellow threads and why this isn't possible in general.
What you want to do, I assume, is kill your process immediately, killing all threads in one step.
The easiest way to achieve this is to press Ctrl-\. This will send a SIGQUIT
instead of a SIGINT
which typically influences also the fellow threads and causes them to terminate.
If this is not enough (because for whatever reason you need to react properly on Ctrl-C), you can send yourself a signal:
import os, signal
os.kill(os.getpid(), signal.SIGQUIT)
This should terminate all running threads unless they especially catch SIGQUIT
in which case you still can use SIGKILL
to perform a hard kill on them. This doesn't give them any option of reacting, though, and might lead to problems.
Upvotes: 0