Reputation: 835
I am trying to scrape a site that has a cloudflare bot check I currently use
import undetected_chromedriver as uc
and portable CHROME.EXE
however this seems to not get me around the bot check , so now I am going to try
from seleniumbase import SB
but using
SB(uc=True,agent=user_agent_cycle,binary_location=chromedriver_path) as sb:
At the with statement it freezes and all I get is:
PS D:\code\Arcgis\FissionStaking2> python .\testbaseuc3.py
<itertools.cycle object at 0x0000027605C89D80>
here is the full code:
from seleniumbase import SB
chromedriver_path = "C:\\temp\\GoogleChromePortable64-132\\App\\Chrome-bin\\chrome.exe"
import random
import itertools
def user_agent_rotator(user_agent_list):
# shuffle the User Agent list
random.shuffle(user_agent_list)
# rotate the shuffle to ensure all User Agents are used
return itertools.cycle(user_agent_list)
# define Chrome options
#options = SB.ChromeOptions()
# create a User Agent list
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
# ... add more User Agents
]
# initialize a generator for the User Agent rotator
user_agent_cycle = user_agent_rotator(user_agents)
print(user_agent_cycle)
with SB(uc=True,agent=user_agent_cycle,binary_location=chromedriver_path) as sb:
print(1)
sb.open("https://google.com/ncr")
print(2)
sb.type('[title="Search"]', "SeleniumBase GitHub page\n")
print(3)
sb.click('[href*="github.com/seleniumbase/"]')
sb.save_screenshot_to_logs() # ./latest_logs/
print(sb.get_page_title())
Upvotes: 0
Views: 49
Reputation: 15556
It looks like you're mixing up the browser binary (Chrome) with the driver (chromedriver). Not the same. Also, the default User Agent that SeleniumBase gives you is already the optimal one. This should be all you need to perform a Google search:
from seleniumbase import SB
with SB(test=True, uc=True) as sb:
sb.open("https://google.com/ncr")
sb.type('[title="Search"]', "SeleniumBase GitHub page\n")
sb.click('[href*="github.com/seleniumbase/"]')
sb.save_screenshot_to_logs() # ./latest_logs/
print(sb.get_page_title())
Upvotes: 0
Reputation: 112
The main idea here is to pass a valid User-Agent string instead of using a generator object, and also tweak the browser fingerprint settings to avoid the bot check. First, you need to replace the cyclic object created by itertools.cycle
with a random selection of a single User-Agent string (e.g., using random.choice(user_agents)
). Make sure SeleniumBase is set up with the correct browser binary path and anti-automation options when initializing. If you’re still being detected, try it with selenium-stealth
module and clear the browser fingerprint data to keep each session independent.
Upvotes: 0