Reputation: 835

Trying to use chrome with seleniumbase and uc=true option

I am trying to scrape a site that has a cloudflare bot check I currently use

import undetected_chromedriver as uc

and portable CHROME.EXE

however this seems to not get me around the bot check , so now I am going to try

from seleniumbase import SB

but using

SB(uc=True,agent=user_agent_cycle,binary_location=chromedriver_path) as sb:

At the with statement it freezes and all I get is:

PS D:\code\Arcgis\FissionStaking2> python .\testbaseuc3.py
<itertools.cycle object at 0x0000027605C89D80>

here is the full code:

from seleniumbase import SB
chromedriver_path = "C:\\temp\\GoogleChromePortable64-132\\App\\Chrome-bin\\chrome.exe"
import random
import itertools

def user_agent_rotator(user_agent_list):
    # shuffle the User Agent list
    random.shuffle(user_agent_list)
    # rotate the shuffle to ensure all User Agents are used
    return itertools.cycle(user_agent_list)


# define Chrome options
#options = SB.ChromeOptions()

# create a User Agent list
user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
    # ... add more User Agents
]

# initialize a generator for the User Agent rotator
user_agent_cycle = user_agent_rotator(user_agents)

print(user_agent_cycle)

with SB(uc=True,agent=user_agent_cycle,binary_location=chromedriver_path) as sb:
    print(1)
    sb.open("https://google.com/ncr")
    print(2)
    sb.type('[title="Search"]', "SeleniumBase GitHub page\n")
    print(3)
    sb.click('[href*="github.com/seleniumbase/"]')
    sb.save_screenshot_to_logs()  # ./latest_logs/
    print(sb.get_page_title())

Upvotes: 0

Answers (2)

Michael Mintz

Reputation: 15556

It looks like you're mixing up the browser binary (Chrome) with the driver (chromedriver). Not the same. Also, the default User Agent that SeleniumBase gives you is already the optimal one. This should be all you need to perform a Google search:

from seleniumbase import SB

with SB(test=True, uc=True) as sb:
    sb.open("https://google.com/ncr")
    sb.type('[title="Search"]', "SeleniumBase GitHub page\n")
    sb.click('[href*="github.com/seleniumbase/"]')
    sb.save_screenshot_to_logs()  # ./latest_logs/
    print(sb.get_page_title())

Upvotes: 0

bigmacsetnotenough

Reputation: 112

The main idea here is to pass a valid User-Agent string instead of using a generator object, and also tweak the browser fingerprint settings to avoid the bot check. First, you need to replace the cyclic object created by itertools.cycle with a random selection of a single User-Agent string (e.g., using random.choice(user_agents)). Make sure SeleniumBase is set up with the correct browser binary path and anti-automation options when initializing. If you’re still being detected, try it with selenium-stealth module and clear the browser fingerprint data to keep each session independent.

Upvotes: 0

Trying to use chrome with seleniumbase and uc=true option

Answers (2)

Related Questions