How to use selenium with python in multithreaded way

Question

Hey guys I am trying to work with selenium using threads. My code is :-

import threading  as th
import time
import base64
import mysql.connector as mysql
import requests
from bs4 import BeautifulSoup
from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options
from functions import *

options = Options()
prefs = {'profile.default_content_setting_values': {'images': 2,'popups': 2, 'geolocation': 2, 
                            'notifications': 2, 'auto_select_certificate': 2, 'fullscreen': 2, 
                            'mouselock': 2, 'mixed_script': 2, 'media_stream': 2, 
                            'media_stream_mic': 2, 'media_stream_camera': 2, 'protocol_handlers': 2, 
                            'ppapi_broker': 2, 'automatic_downloads': 2, 'midi_sysex': 2, 
                            'push_messaging': 2, 'ssl_cert_decisions': 2, 'metro_switch_to_desktop': 2, 
                            'protected_media_identifier': 2, 'app_banner': 2, 'site_engagement': 2, 
                            'durable_storage': 2}}
print('Crawling process started')
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(executable_path='chromedriver.exe', options=options)
driver.set_page_load_timeout(50000)
urls='https://google.com https://youtube.com'
def getinf(url_):
    driver.get(url_)
    soup=BeautifulSoup(driver.page_source, 'html5lib')
    print(soup.select('title'))
for url in urls.split():
    t=th.Thread(target=getinf, args=(url,))
    t.start()

When the script run the tabs are not opened at once as I expected(from threads) instead the process is done one by one and the title of last url(https://youtube.com) is only printed. when I try Multiprocessing , program crashes many times. I am making a web crawler and some websites(like twitter) requires JavaScript for showing content, so I can't use requests or urllib as well. What can be the solution for this. Any other library suggestion will be welcomed.

Jeni · Accepted Answer

Try putting the creation of the chromedriver in the thread code. Otherwise you have one driver, and you are changing the url of one and the same driver. Instead try to create separate chromedriver for each thread.

Note: I have not tried the code, just suggestion.

import threading  as th
import time
import base64
import mysql.connector as mysql
import requests
from bs4 import BeautifulSoup
from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options
from functions import *

options = Options()
prefs = {'profile.default_content_setting_values': {'images': 2,'popups': 2, 'geolocation': 2, 
                            'notifications': 2, 'auto_select_certificate': 2, 'fullscreen': 2, 
                            'mouselock': 2, 'mixed_script': 2, 'media_stream': 2, 
                            'media_stream_mic': 2, 'media_stream_camera': 2, 'protocol_handlers': 2, 
                            'ppapi_broker': 2, 'automatic_downloads': 2, 'midi_sysex': 2, 
                            'push_messaging': 2, 'ssl_cert_decisions': 2, 'metro_switch_to_desktop': 2, 
                            'protected_media_identifier': 2, 'app_banner': 2, 'site_engagement': 2, 
                            'durable_storage': 2}}
print('Crawling process started')
options.add_experimental_option('prefs', prefs)
urls='https://google.com https://youtube.com'
def getinf(url_):
    driver = webdriver.Chrome(executable_path='chromedriver.exe', options=options)
    driver.set_page_load_timeout(50000)
    driver.get(url_)
    soup=BeautifulSoup(driver.page_source, 'html5lib')
    print(soup.select('title'))
for url in urls.split():
    t=th.Thread(target=getinf, args=(url,))
    t.start()

How to use selenium with python in multithreaded way

Answers (1)

Related Questions