jim jarnac
jim jarnac

Reputation: 5152

Python Requests - debugging POST requests

I am trying to scrape a website, in which i have to get to the right page using a POST request.

Here below are the different screen showing how i got to find which are the headers and payload that i needed to use in my request:

1) Here the page: it is a list of economic indicators:

enter image description here

2) It is possible to select which country's indicator are displayed using the "filter that is on the right hand side of the screen:

enter image description here

3) Clicking the "apply" button will send a POST requests to the site that will refresh the page to show only the information of the ticked boxes. Here a screencapture showing the elements of the form sent in the POST request:

enter image description here

But if i try to do this POST request using python requests using the following code (see below) it seems that the form is not processed, and the page returned is simply the default one.

payload= {
 'country[]': 5,
 'limit_from': '0',
 'submitFilters': '1',
 'timeFilter': 'timeRemain',
 'currentTab': 'today',
 'timeZone': '55'}
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36',
         'X-Requested-With': 'XMLHttpRequest',
         'Accept':'*/*',
         'Accept-Encoding':'gzip, deflate, br',
         'Accept-Language':'en-US,en;q=0.8',
         'Connection':'keep-alive',
         'Host':'www.investing.com',
         'Origin':'https://www.investing.com',
         'Referer':'https://www.investing.com/economic-calendar/',
         'Content-Length':'94',
         'Content-Type':'application/x-www-form-urlencoded',
         'Cookie':'adBlockerNewUserDomains=1505902229; __qca=P0-734073995-1505902265195; __gads=ID=d69b337b0f60d8f0:T=1505902254:S=ALNI_MYlYKXUUbs8WtYTEO2fN9O_q9oykA; cookieConsent=was-set; travelDistance=4; editionPostpone=1507424197769; PHPSESSID=v9q2deffu2n0b9q07t3jkgk4a4; StickySession=id.71595783179.419www.investing.com; geoC=GB; gtmFired=OK; optimizelySegments=%7B%224225444387%22%3A%22gc%22%2C%224226973206%22%3A%22direct%22%2C%224232593061%22%3A%22false%22%2C%225010352657%22%3A%22none%22%7D; optimizelyEndUserId=oeu1505902244597r0.8410692836488942; optimizelyBuckets=%7B%228744291438%22%3A%228731763165%22%2C%228785438042%22%3A%228807365450%22%7D; nyxDorf=OT5hY2M1P2E%2FY24xZTE3YTNoMG9hYmZjPDdlYWFnNz0wNjNvYW5kYWU6PmFvbDM6Y2Y0MDAwYTk1MzdpYGRhPDk2YTNjYT82P2E%3D; billboardCounter_1=1; _ga=GA1.2.1460679521.1505902261; _gid=GA1.2.655434067.1508542678'
        }
import lxml.html
import requests
g=requests.post("https://www.investing.com/economic-calendar/",data=payload,headers=headers)

html = lxml.html.fromstring(g.text)

tr=html.xpath("//table[@id='economicCalendarData']//tr")

for t in tr[4:]:
    print(t.find(".//td[@class='left flagCur noWrap']/span").attrib["title"])

This is visible as if, for instance, i select only country "5" (the USA), post the request, and look for the countries present in the result page, I will see other countries as well.

Anyone knows what i am doing wrong with that POST request?

Upvotes: 0

Views: 1009

Answers (1)

Iguananaut
Iguananaut

Reputation: 23296

As it shows in your own screenshot, it appears that the site posts to the URL

https://www.investing.com/economic-calendar/Service/getCalendarFilteredData

whereas you're only posting directly to

https://www.investing.com/economic-calendar/

Upvotes: 1

Related Questions