Pyd
Pyd

Reputation: 6159

Pyppeteer closed unexpectedly for specific sites which has password alert box

I am trying to open a web page which has alertbox when opening,

When we set await page.setRequestInterception(True) it takes forever and not finishing it,

I also tried dismissing the dialog box but nothing works, here is my code below.

from pyppeteer import launch
from utils.agents import get_user_agent
import asyncio

browser = await launch(headless=True,
                               ignoreHTTPSErrors=True,
                               acceptInsecureCerts=True,

                               # autoClose=False,
                               # handleSIGINT=False,
                               # handleSIGTERM=False,
                               # handleSIGHUP=False,

                               args=[
                                   "--no-sandbox",
                                   "--disable-gpu",
                                   "--ignore-certificate-errors",
                                   "--allow-running-insecure-content",
                                   "--disable-web-security",
                                   "--disable-setuid-sandbox"
                                   '--disable-popup-blocking',
                                   '--disable-dev-shm-usage',
                                #    '--single-process',
                                #    '--disable-gpu',
                                   '--no-zygote'
                                   # "--user-data-dir=/tmp/chromium"
                               ])



page = await browser.newPage()

user_agent = get_user_agent()
await page.setUserAgent(user_agent)
await page.setRequestInterception(True)

url = "http://mogilitycapital.com"


netcalls = []


async def handle_request_redirects(request, url):

    if request.resourceType in ['stylesheet', 'css', 'image', 'font']:
        await request. Abort('blockedbyclient')
 
    else:
        await request.continue_()

async def intercept_network_response(response, netcalls):
    netcalls.append(
        {
            "url": response.url,
            "method": response.request.method,
            "headers": response.headers,
            "status": response.status

        }
    )
    
async def handle_dialog(dialog):
    
    
    if dialog. Type == 'alert':
        await dialog.dismiss()
    elif dialog. Type == 'confirm':
        await dialog.accept()

page.on('request', lambda req: asyncio.ensure_future((handle_request_redirects(req, url))))
page.on('response', lambda response: asyncio.ensure_future(intercept_network_response(response, netcalls)))
page.on('dialog', lambda dialog: asyncio.ensure_future(handle_dialog(dialog)))

resp = await page.goto(url, timeout=60000)






When we comment await page.setRequestInterception(True) code works fine,

our use case is to scrape more than million domains , get raw HTML, understand network calls ( so we need interception ) , not just this website, but some websites having this basic auth when we request, so we just need to click cancel on it, but that I am unable to do . if you open the URL provided in my example, you will see the authentication box, we just need to click cancel and extract whatever HTML comes.

Another expected result:

i debugged the program, it blocks with the network interceptor function, may be somehow we need to have a logic to abort the request for these kind of authentication websites, that is also OK

anything is fine

  1. get the html after cancelling
  2. block these kind of authentication websites

anything is fine.

Upvotes: 0

Views: 131

Answers (0)

Related Questions