RoverDar
RoverDar

Reputation: 441

Selenium and multiprocessing in python

In my django app I use selenium for crawling and parsing some html page. I tried to introduce the multiprocess to improve performance. This is my code:

import os
from selenium import webdriver
from multiprocessing import Pool

os.environ["DISPLAY"]=":56017"

def render_js(url):
    driver = webdriver.Firefox()
    driver.set_page_load_timeout(300)
    driver.get(url)
    text = driver.page_source
    driver.quit()
    return text

def parsing(url):
    text = render_js(url)
    ... parsing the text ....
    ... write in db.... 


url_list = ['www.google.com','www.python.com','www.microsoft.com']
pool = Pool(processes=2)
pool.map_async(parsing, url_list)
pool.close()
pool.join()

I have this error when two processes work together simultaneously and use selenium: the first process starts firefox with 'www.google.it' and it returns the correct text, the second with url 'www.python.com' returns the text of www.google.it and not of www.python.com. Can you tell me where I'm wrong?

Upvotes: 2

Views: 4992

Answers (1)

Alan
Alan

Reputation: 71

from selenium import webdriver
from multiprocessing import Pool

def parsing(url):
    driver = webdriver.Chrome()
    driver.set_page_load_timeout(300)
    driver.get(url)
    text = driver.page_source
    driver.close()
    return text

url_list = ['http://www.google.com', 'http://www.python.com']
pool = Pool(processes=4)
ret = pool.map(parsing, url_list)
for text in ret:
    print text[:30]

I tried running your code and Selenium complained about bad urls. Adding http:// to it made it work.

Upvotes: 3

Related Questions