Scrape .aspx form with Python

Question

i'm trying to scrape: https://apps.neb-one.gc.ca/CommodityStatistics/Statistics.aspx, which in paper seems like a easy task and with a lot of resources from other SO questions. Nonetheless, I'm getting the same error no matter how I change my request.

I've tried the following:

import requests
from bs4 import BeautifulSoup

url = "https://apps.neb-one.gc.ca/CommodityStatistics/Statistics.aspx"

with requests.Session() as s:
    s.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}

    response = s.get(url)
    soup = BeautifulSoup(response.content)

     data = {
         "ctl00$MainContent$rdoCommoditySystem": "ELEC",
         "ctl00$MainContent$lbReportName": "171",
         "ctl00$MainContent$ddlFrom": "01/11/2018 12:00:00 AM",
         "ctl00$MainContent$rdoReportFormat": "Excel",
         "ctl00$MainContent$btnView": "View",
         "__EVENTVALIDATION": soup.find('input', {'name':'__EVENTVALIDATION'}).get('value',''),
         "__VIEWSTATE": soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
         "__VIEWSTATEGENERATOR": soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value', '')
     }

    response = requests.post(url, data=data)

When I print the response.contents object, I get this message (tl;dr, it says that "System error occurred. The system will alert technical support of the problem"):

b'








   



    
        
            Error
        
        System error occurred. The system will alert technical support of the problem.
    
    


'

I have used other options, like change the __EVENTTARGET argument, as suggested here, and also pass the cookie from the first request to the POST request. Checking the source of the page, I noticed that the form has a "query" function that need the __EVENTTARGET and __EVENTARGUMENT to work:

//

But both arguments are empty (as can be checked in the Chrome developer inspector) in the body of the POST response. Another problem is that I need to either download the file in any of the formats (PDF or Excel), or get the HTML version, but the .ASPX form do not render the information in the same page, it open a new url: https://apps.neb-one.gc.ca/CommodityStatistics/ViewReport.aspx with the information instead.

I am kind of lost here, what I am missing?

Scrape .aspx form with Python

Answers (1)

Related Questions