Selenium and multiprocessing in python

Question

In my django app I use selenium for crawling and parsing some html page. I tried to introduce the multiprocess to improve performance. This is my code:

import os
from selenium import webdriver
from multiprocessing import Pool

os.environ["DISPLAY"]=":56017"

def render_js(url):
    driver = webdriver.Firefox()
    driver.set_page_load_timeout(300)
    driver.get(url)
    text = driver.page_source
    driver.quit()
    return text

def parsing(url):
    text = render_js(url)
    ... parsing the text ....
    ... write in db.... 


url_list = ['www.google.com','www.python.com','www.microsoft.com']
pool = Pool(processes=2)
pool.map_async(parsing, url_list)
pool.close()
pool.join()

I have this error when two processes work together simultaneously and use selenium: the first process starts firefox with 'www.google.it' and it returns the correct text, the second with url 'www.python.com' returns the text of www.google.it and not of www.python.com. Can you tell me where I'm wrong?

Selenium and multiprocessing in python

Answers (1)

Related Questions