Reputation: 107
So I recently started testing selenium for some personal projects and one problem I ran into was being banned from some websites due to recaptcha v3 tests. I did some more research and found the recaptcha v3 demo and did some testing and eventually wrote this:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36");
driver = webdriver.Chrome(options=options, executable_path=ChromeDriverManager().install())
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
driver.get("https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php")
WebDriverWait(driver, 10).until(EC.title_contains("Index"))
I have looked at various stack overflow questions including the following,
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Can a website detect when you are using selenium with chromedriver?
How does recaptcha 3 know I'm using selenium/chromedriver?
and more
While the arguments added do help to improve the recaptcha v3 score, it is still extremely inconsistent. about half the time I receive a passing score of .7 and the other half I receive a failing score of .1.
Please help me to improve my recaptcha scores and consistently pass
EDIT 1: Signing into a google account in the chrome instance often changes the results of the demo, however still do not entirely prevent failing scores
Upvotes: 1
Views: 5082
Reputation: 1
If you can scrape through pages without javascript, then disabling javascript while you scrape, might do the trick for you.
Upvotes: 0
Reputation: 55002
Nobody really knows except google how they score these. But... we can imagine I think some obvious factors:
residential / business ip vs datacenter
google / oauth cookies
obvious things like user-agent and browser fingerprinting.
HTH.
Upvotes: 0
Reputation: 193208
To increase your recaptcha-v3 scrore from .7
to higher levels i.e. .9
or so you can rotate user-agent through execute_cdp_cmd()
as follows:
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientA"}})
In case there is a necessity you can add multiple as follows:
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientA"}})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientB"}})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browserClientC"}})
So effectively your working solution would be:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
driver.execute_cdp_cmd("Network.enable", {})
driver.execute_cdp_cmd("Network.setExtraHTTPHeaders", {"headers": {"User-Agent": "browser1"}})
driver.get("https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "li.step3 pre.response"))).get_attribute("innerHTML"))
Console Output:
DevTools listening on ws://127.0.0.1:53748/devtools/browser/eac086e8-f1c0-42d3-8ef8-d132f4b4c82b
{
"success": true,
"hostname": "recaptcha-demo.appspot.com",
"challenge_ts": "2020-01-20T22:31:32Z",
"apk_package_name": null,
"score": 0.9,
"action": "examples/v3scores",
"error-codes": []
}
Console Snapshot:
Upvotes: 2