Reputation: 9
i am trying to write a spider with multiprocessing module
here is my python code:
# -*- coding:utf-8 -*-
import multiprocessing
import requests
class SpiderWorker(object):
def __init__(self, q):
self._q = q
def run(self):
def _crawl_item(url):
requests.get("http://www.baidu.com")
if respon.ok:
print respon.url
while True:
rst = self._q.get()
_crawl_item(rst)
def general_worker():
q = multiprocessing.Queue()
CPU_COUNT = multiprocessing.cpu_count()
worker_processes = [
multiprocessing.Process(target=SpiderWorker(q).run)
for i in range(CPU_COUNT)
]
map( lambda process: process.start(), worker_processes )
return q, worker_processes
maybe it is my process way wrong every time i run this code, my process tell me
<Process(Process-1, stopped[SIGSEGV])>
hope love it
Upvotes: 0
Views: 385
Reputation: 2274
The major problem here is that you don't have any information on why your processes fail. It could be gevent, but it could just as easily be something else. So learning the actual reason why your processes get terminated is the first step before doing anything else.
What you need is multiprocessing.log_to_stderr()
:
class SpiderWorker(object):
# ...
def run(self):
logger = multiprocessing.log_to_stderr()
logger.setLevel(multiprocessing.SUBDEBUG)
try:
# Here goes your original run() code
except Exception:
logger.exception('whoopsie')
What this code does:
multiprocessing
module events (just in case as you probably don't need them)..exception()
method on the logger, which not only logs the message (it's meaningless anyway as we don't know what actually happens) but most importantly logs the entire error traceback - which we actually need.Upvotes: 2