Reputation: 11
this is a web scraping project I'm working on.
I need to send the response of this v2 recaptcha but it's not bringing the data I need `
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}
url = 'https://www2.detran.rn.gov.br/externo/consultarveiculo.asp'
session = requests.session()
fazer_get = session.get(url, headers=headers)
cookie = fazer_get.cookies
html = fazer_get.text
try:
rgxCaptchaKey = re.search(r'<div\s*class="g-recaptcha"\s*data-\s*sitekey="([^\"]*?)"></div>', html, re.IGNORECASE)
captchaKey = rgxCaptchaKey.group(1)
except:
print('erro')
resposta_captcha = captcha(captchaKey, url, KEY)
placa = 'pcj90'
renavam = '57940'
payload = {
'oculto:' 'AvancarC'
'placa': placa,
'renavam': renavam,
'g-recaptcha-response': resposta_captcha['code'],
'btnConsultaPlaca': ''
}
fazerPost = session.post(
url, payload,
headers=headers,
cookies=cookie)
`
I tried to send the captcha response in the payload but I couldn't get to the page I want
Upvotes: 1
Views: 4943
Reputation: 625
If the website you're trying to scrape is reCaptcha protected, your best bet is to use a stealthy method for scraping. That means either Selenium (with at least selenium-stealth
) or a third party web scraper, such as WebScrapingAPI, where I'm an engineer.
The advantage of using the third party service is that it usually comes packed with reCaptcha solving, IP rotation systems and other various features to prevent bot detection, so you can focus on building handling the scraped data, rather than building the scraper.
In order to have a better view on both options, here are two implementation examples you can compare:
1. Python With Stealthy Selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium_stealth import stealth
from bs4 import BeautifulSoup
URL = 'https://www2.detran.rn.gov.br/externo/consultarveiculo.asp'
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options)
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True)
driver.get(URL)
html = driver.page_source
driver.quit()
You should also look into integrating a captcha solver (like 2captcha) with his code.
2. Python With WebScrapingAPI
import requests
URL = 'https://www2.detran.rn.gov.br/externo/consultarveiculo.asp'
API_KEY = '<YOUR_API_KEY>'
SCRAPER_URL = 'https://api.webscrapingapi.com/v1'
params = {
"api_key":API_KEY,
"url": URL,
"render_js":"1",
"js_instructions":'''
[{
"action":"value",
"selector":"input#placa",
"timeout": 5000,
"value":"<YOUR_EMAIL_OR_USERNAME>"
},
{
"action":"value",
"selector":"input#renavam",
"timeout": 5000,
"value":"<YOUR_PASSWORD>"
},
{
"action":"submit",
"selector":"button#btnConsultaPlaca",
"timeout": 5000
}]
'''
}
res = requests.get(SCRAPER_URL, params=params)
print(res.text)
Upvotes: 1