lianghh
lianghh

Reputation: 9

Python multiprocessing module do not work

i am trying to write a spider with multiprocessing module

here is my python code:

# -*- coding:utf-8 -*-

import multiprocessing
import requests


class SpiderWorker(object):


    def __init__(self, q):
        self._q = q

    def run(self):

        def _crawl_item(url):
            requests.get("http://www.baidu.com")
            if respon.ok:
                print respon.url

        while True:
            rst = self._q.get()
            _crawl_item(rst)


def general_worker():

    q = multiprocessing.Queue()

    CPU_COUNT = multiprocessing.cpu_count()

    worker_processes = [
        multiprocessing.Process(target=SpiderWorker(q).run)
        for i in range(CPU_COUNT)
    ]

    map( lambda process: process.start(), worker_processes )

    return q, worker_processes

maybe it is my process way wrong every time i run this code, my process tell me

<Process(Process-1, stopped[SIGSEGV])>

hope love it

Upvotes: 0

Views: 385

Answers (1)

Lav
Lav

Reputation: 2274

The major problem here is that you don't have any information on why your processes fail. It could be gevent, but it could just as easily be something else. So learning the actual reason why your processes get terminated is the first step before doing anything else.

What you need is multiprocessing.log_to_stderr():

class SpiderWorker(object):
    # ...
    def run(self):
        logger = multiprocessing.log_to_stderr()
        logger.setLevel(multiprocessing.SUBDEBUG)
        try:
            # Here goes your original run() code
        except Exception:
            logger.exception('whoopsie')

What this code does:

  1. Creates a special logger which will transmit it's information to the main process and dump it to stderr (console by default).
  2. Configures this logger to report everything, including some internal multiprocessing module events (just in case as you probably don't need them).
  3. Wraps your entire code in catch-all statement so whatever happens there cannot escape your notice.
  4. Runs .exception() method on the logger, which not only logs the message (it's meaningless anyway as we don't know what actually happens) but most importantly logs the entire error traceback - which we actually need.

Upvotes: 2

Related Questions