Patrick Potocny
Patrick Potocny

Reputation: 154

Multiprocessing in selenium opens new window for each process

I'm developing a web-bot that will be completing typing exercises on certain website. To do so I'm using Selenium and undetected-chromedriver(its optimized version of regular chromedriver which does not trigger anti-bot services, but functionality remains the same). It opens, logs into the website and then it opens current exercise and it will start typing. But I need it to stop typing once the exercise is done and the only way is to keep checking if the results appear and then stop the type_text function. The original way I did it is that inside the type_text's for loop after each letter was typed I checked for the var_to_check. But this way is slowing down the typing too much as it after every letter keeps checking var_to_check. So I thought about utilizing multiprocessing and running the check concurrently with the type_text. I created the check_results function, that keeps checking the var_to_check every e.g. 0.1s and if it detects the exercise ended it sets shared_var to 1 and then type_text detects it and stops the typing.
The problem that happens is as the check_results, type_text processes are started two new Chrome windows are opened and in both of those windows the script like starts all over again, it opens the website and tries to log me in but that fails because I'm already logged in in the first window. While in the original window nothing happens. No errors are thrown.

I'm fairly new to multiprocessing so this might not be the best way to e.g. use multiprocessing.Value as shared memory between processes but that's what I came up with so far, so any improvements or better ways to handling this problem are appreciated. But I don't understand why is it opening two new windows and starting the script "all over again" in them. I need both the processes to run on the original Chrome window. I tried to use regular Selenium webdriver instead of undetected-chromedriver but nothing changed.

Here is the code, its shortened from my original so its more readable but principle stays the same:

from multiprocessing import Process, Value
from undetected_chromedriver import v2 as uc #selenium webdriver add-on 

driver = uc.Chrome()
# using context manager bcs of undetected_chromedriver module 
with driver:
    driver.get('https://some_site.com/')


    def login(driver=driver):
        # logs me into the website and opens typing exercise
        some_code


    def check_results(shared_var, driver=driver):
        # this func i need to run concurrently with type_text 
        keep_checking = 1
        while keep_checking:
            var_to_check = driver.find_element_by_xpath(some_xpath).text
            if var_to_check != '':
                with shared_var.get_lock():
                    shared_var.value = 1
                keep_checking = 0
            time.sleep(0.1)


    def type_text(shared_var, driver=driver):
        text = 'Some text i need to be typed out'
        some_element = driver.find_element_by_xpath(some_xpath)

        for letter in text:
            some_element.send_keys(letter)

            #  original way i did the check but it slows down the typing
            # var_to_check = driver.find_element_by_xpath(some_xpath).text
            # if var_to_check != '':
            #     break

            # after every letter i need to check state of shared_var
            with shared_var.get_lock():
                if shared_var.value == 1:
                    break
        print('Typing ended')

    
    def do_exercise():
      
        exc_ended = Value("i", 0)

        check_var_process = Process(
            target=check_results, args=(exc_ended, ))
        type_text_process = Process(
            target=type_text, args=(exc_ended, ))

        check_var_process.start()
        type_text_process.start()
        # at this poin it opens two new chrome windows 
        check_var_process.join()
        type_text_process.join()

        results_of_typing = driver.find_element_by_xpath(some_xpath).text

        return results_of_typing
    

    if __name__ == '__main__':
        login()
        results = do_exercise()
        print(results)

Upvotes: 1

Views: 2360

Answers (1)

Booboo
Booboo

Reputation: 44128

First, when you post a question tagged with multiprocessing you are supposed to also add a tag specifying the platform you are running under. You should have sacrificed one of your tags such as selenium-webdriver to have done that. But I suspect that you are running under a platform such as Windows that uses the spawn method to create new processes. This means that to initiate a new process a new, empty address space is created, a Python interpreter is loaded and your source file is re-executed from the top. This is why you must have if __name__ == '__main__': controlling code that creates new processes. For if you didn't, you would get into a recursive loop creating new processes ad infinitum.

But this also means that any code you have at global scope will get re-executed for each newly created process. And what you have at global scope is:

driver = uc.Chrome()
# using context manager bcs of undetected_chromedriver module 
with driver:
    driver.get('https://some_site.com/')
    ...

And that is why you get a second window. You would need to do a bit of code rearranging and explicitly pass the driver to your functions. Otherwise, I have not checked your logic:

if __name__ == '__main__':
    driver = uc.Chrome()
    # using context manager bcs of undetected_chromedriver module 
    with driver:
        driver.get('https://some_site.com/')    
        login(driver)
        results = do_exercise(driver)
        print(results)

But I doubt that the driver can be pickled (the standard ChromeDriver cannot), meaning that it can be serialized and de-serialized, which is required to be passed from the address space of the main process to the address space of the sub-process. Therefore, I don't think you will be able to get this to work with multiprocessing.

But I believe you have an even bigger problem, revealed by the next approach:

Since your worker functions are doing a bit of sleeping in between checking, the next "ploy" would be just to replace your multiprocessing.Process instances with a multithreading.Thread instances. You should then not have to use a shared Value instance for communication. You then might have to put in a small sleep call in type_text to give check_text a chance to run. But according to Can Selenium use multi threading in one browser?, ChromeDriver is not thread-safe and I would guess that the same holds true for undetected_chromedriver.

You could try it, but I have my very strong doubts about the results. If it turns out that you can't concurrently execute calls such as find_element_by_xpath and send_keys reliably in different threads using the same driver instance, multithreading will not work either.

Upvotes: 2

Related Questions