Reputation: 154
I'm developing a web-bot that will be completing typing exercises on certain website. To do so I'm using Selenium and undetected-chromedriver(its optimized version of regular chromedriver which does not trigger anti-bot services, but functionality remains the same). It opens, logs into the website and then it opens current exercise and it will start typing. But I need it to stop typing once the exercise is done and the only way is to keep checking if the results appear and then stop the type_text
function. The original way I did it is that inside the type_text
's for loop after each letter was typed I checked for the var_to_check
. But this way is slowing down the typing too much as it after every letter keeps checking var_to_check
. So I thought about utilizing multiprocessing
and running the check concurrently with the type_text
. I created the check_results
function, that keeps checking the var_to_check
every e.g. 0.1s and if it detects the exercise ended it sets shared_var
to 1
and then type_text
detects it and stops the typing.
The problem that happens is as the check_results
, type_text
processes are started two new Chrome windows are opened and in both of those windows the script like starts all over again, it opens the website and tries to log me in but that fails because I'm already logged in in the first window. While in the original window nothing happens. No errors are thrown.
I'm fairly new to multiprocessing
so this might not be the best way to e.g. use multiprocessing.Value
as shared memory between processes but that's what I came up with so far, so any improvements or better ways to handling this problem are appreciated. But I don't understand why is it opening two new windows and starting the script "all over again" in them. I need both the processes to run on the original Chrome window. I tried to use regular Selenium webdriver instead of undetected-chromedriver but nothing changed.
Here is the code, its shortened from my original so its more readable but principle stays the same:
from multiprocessing import Process, Value
from undetected_chromedriver import v2 as uc #selenium webdriver add-on
driver = uc.Chrome()
# using context manager bcs of undetected_chromedriver module
with driver:
driver.get('https://some_site.com/')
def login(driver=driver):
# logs me into the website and opens typing exercise
some_code
def check_results(shared_var, driver=driver):
# this func i need to run concurrently with type_text
keep_checking = 1
while keep_checking:
var_to_check = driver.find_element_by_xpath(some_xpath).text
if var_to_check != '':
with shared_var.get_lock():
shared_var.value = 1
keep_checking = 0
time.sleep(0.1)
def type_text(shared_var, driver=driver):
text = 'Some text i need to be typed out'
some_element = driver.find_element_by_xpath(some_xpath)
for letter in text:
some_element.send_keys(letter)
# original way i did the check but it slows down the typing
# var_to_check = driver.find_element_by_xpath(some_xpath).text
# if var_to_check != '':
# break
# after every letter i need to check state of shared_var
with shared_var.get_lock():
if shared_var.value == 1:
break
print('Typing ended')
def do_exercise():
exc_ended = Value("i", 0)
check_var_process = Process(
target=check_results, args=(exc_ended, ))
type_text_process = Process(
target=type_text, args=(exc_ended, ))
check_var_process.start()
type_text_process.start()
# at this poin it opens two new chrome windows
check_var_process.join()
type_text_process.join()
results_of_typing = driver.find_element_by_xpath(some_xpath).text
return results_of_typing
if __name__ == '__main__':
login()
results = do_exercise()
print(results)
Upvotes: 1
Views: 2360
Reputation: 44128
First, when you post a question tagged with multiprocessing
you are supposed to also add a tag specifying the platform you are running under. You should have sacrificed one of your tags such as selenium-webdriver
to have done that. But I suspect that you are running under a platform such as Windows that uses the spawn method to create new processes. This means that to initiate a new process a new, empty address space is created, a Python interpreter is loaded and your source file is re-executed from the top. This is why you must have if __name__ == '__main__':
controlling code that creates new processes. For if you didn't, you would get into a recursive loop creating new processes ad infinitum.
But this also means that any code you have at global scope will get re-executed for each newly created process. And what you have at global scope is:
driver = uc.Chrome()
# using context manager bcs of undetected_chromedriver module
with driver:
driver.get('https://some_site.com/')
...
And that is why you get a second window. You would need to do a bit of code rearranging and explicitly pass the driver to your functions. Otherwise, I have not checked your logic:
if __name__ == '__main__':
driver = uc.Chrome()
# using context manager bcs of undetected_chromedriver module
with driver:
driver.get('https://some_site.com/')
login(driver)
results = do_exercise(driver)
print(results)
But I doubt that the driver can be pickled (the standard ChromeDriver cannot), meaning that it can be serialized and de-serialized, which is required to be passed from the address space of the main process to the address space of the sub-process. Therefore, I don't think you will be able to get this to work with multiprocessing.
But I believe you have an even bigger problem, revealed by the next approach:
Since your worker functions are doing a bit of sleeping in between checking, the next "ploy" would be just to replace your multiprocessing.Process
instances with a multithreading.Thread
instances. You should then not have to use a shared Value
instance for communication. You then might have to put in a small sleep
call in type_text
to give check_text
a chance to run. But according to Can Selenium use multi threading in one browser?, ChromeDriver is not thread-safe and I would guess that the same holds true for undetected_chromedriver
.
You could try it, but I have my very strong doubts about the results. If it turns out that you can't concurrently execute calls such as find_element_by_xpath
and send_keys
reliably in different threads using the same driver instance, multithreading will not work either.
Upvotes: 2