Reputation: 5152
I am trying to scrape a website, in which i have to get to the right page using a POST request.
Here below are the different screen showing how i got to find which are the headers and payload that i needed to use in my request:
1) Here the page: it is a list of economic indicators:
2) It is possible to select which country's indicator are displayed using the "filter that is on the right hand side of the screen:
3) Clicking the "apply" button will send a POST requests to the site that will refresh the page to show only the information of the ticked boxes. Here a screencapture showing the elements of the form sent in the POST request:
But if i try to do this POST request using python requests using the following code (see below) it seems that the form is not processed, and the page returned is simply the default one.
payload= {
'country[]': 5,
'limit_from': '0',
'submitFilters': '1',
'timeFilter': 'timeRemain',
'currentTab': 'today',
'timeZone': '55'}
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Accept':'*/*',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-US,en;q=0.8',
'Connection':'keep-alive',
'Host':'www.investing.com',
'Origin':'https://www.investing.com',
'Referer':'https://www.investing.com/economic-calendar/',
'Content-Length':'94',
'Content-Type':'application/x-www-form-urlencoded',
'Cookie':'adBlockerNewUserDomains=1505902229; __qca=P0-734073995-1505902265195; __gads=ID=d69b337b0f60d8f0:T=1505902254:S=ALNI_MYlYKXUUbs8WtYTEO2fN9O_q9oykA; cookieConsent=was-set; travelDistance=4; editionPostpone=1507424197769; PHPSESSID=v9q2deffu2n0b9q07t3jkgk4a4; StickySession=id.71595783179.419www.investing.com; geoC=GB; gtmFired=OK; optimizelySegments=%7B%224225444387%22%3A%22gc%22%2C%224226973206%22%3A%22direct%22%2C%224232593061%22%3A%22false%22%2C%225010352657%22%3A%22none%22%7D; optimizelyEndUserId=oeu1505902244597r0.8410692836488942; optimizelyBuckets=%7B%228744291438%22%3A%228731763165%22%2C%228785438042%22%3A%228807365450%22%7D; nyxDorf=OT5hY2M1P2E%2FY24xZTE3YTNoMG9hYmZjPDdlYWFnNz0wNjNvYW5kYWU6PmFvbDM6Y2Y0MDAwYTk1MzdpYGRhPDk2YTNjYT82P2E%3D; billboardCounter_1=1; _ga=GA1.2.1460679521.1505902261; _gid=GA1.2.655434067.1508542678'
}
import lxml.html
import requests
g=requests.post("https://www.investing.com/economic-calendar/",data=payload,headers=headers)
html = lxml.html.fromstring(g.text)
tr=html.xpath("//table[@id='economicCalendarData']//tr")
for t in tr[4:]:
print(t.find(".//td[@class='left flagCur noWrap']/span").attrib["title"])
This is visible as if, for instance, i select only country "5" (the USA), post the request, and look for the countries present in the result page, I will see other countries as well.
Anyone knows what i am doing wrong with that POST request?
Upvotes: 0
Views: 1009
Reputation: 23296
As it shows in your own screenshot, it appears that the site posts to the URL
https://www.investing.com/economic-calendar/Service/getCalendarFilteredData
whereas you're only posting directly to
https://www.investing.com/economic-calendar/
Upvotes: 1