Reputation: 6159
I am trying to open a web page which has alertbox when opening,
When we set await page.setRequestInterception(True)
it takes forever and not finishing it,
I also tried dismissing the dialog box but nothing works, here is my code below.
from pyppeteer import launch
from utils.agents import get_user_agent
import asyncio
browser = await launch(headless=True,
ignoreHTTPSErrors=True,
acceptInsecureCerts=True,
# autoClose=False,
# handleSIGINT=False,
# handleSIGTERM=False,
# handleSIGHUP=False,
args=[
"--no-sandbox",
"--disable-gpu",
"--ignore-certificate-errors",
"--allow-running-insecure-content",
"--disable-web-security",
"--disable-setuid-sandbox"
'--disable-popup-blocking',
'--disable-dev-shm-usage',
# '--single-process',
# '--disable-gpu',
'--no-zygote'
# "--user-data-dir=/tmp/chromium"
])
page = await browser.newPage()
user_agent = get_user_agent()
await page.setUserAgent(user_agent)
await page.setRequestInterception(True)
url = "http://mogilitycapital.com"
netcalls = []
async def handle_request_redirects(request, url):
if request.resourceType in ['stylesheet', 'css', 'image', 'font']:
await request. Abort('blockedbyclient')
else:
await request.continue_()
async def intercept_network_response(response, netcalls):
netcalls.append(
{
"url": response.url,
"method": response.request.method,
"headers": response.headers,
"status": response.status
}
)
async def handle_dialog(dialog):
if dialog. Type == 'alert':
await dialog.dismiss()
elif dialog. Type == 'confirm':
await dialog.accept()
page.on('request', lambda req: asyncio.ensure_future((handle_request_redirects(req, url))))
page.on('response', lambda response: asyncio.ensure_future(intercept_network_response(response, netcalls)))
page.on('dialog', lambda dialog: asyncio.ensure_future(handle_dialog(dialog)))
resp = await page.goto(url, timeout=60000)
When we comment await page.setRequestInterception(True)
code works fine,
our use case is to scrape more than million domains , get raw HTML, understand network calls ( so we need interception ) , not just this website, but some websites having this basic auth when we request, so we just need to click cancel on it, but that I am unable to do . if you open the URL provided in my example, you will see the authentication box, we just need to click cancel and extract whatever HTML comes.
Another expected result:
i debugged the program, it blocks with the network interceptor function, may be somehow we need to have a logic to abort the request for these kind of authentication websites, that is also OK
anything is fine
anything is fine.
Upvotes: 0
Views: 131