anarchy
anarchy

Reputation: 5184

Using python requests to post - How do I get the correct table data I request?

I am trying to get the historical economic calendar data from this website - https://www.investing.com/economic-calendar/ from the following dates (1 Feb 2020 to 5 Feb 2020).

Today is 4 Feb 2020.

If I use the https://www.investing.com/economic-calendar/ url below, I am able to extract the table using beautifulsoup but I am unable to select any day except the current day. I get a table saved in my python script for (4 Feb 2020) which is today.

import requests
import pandas as pd
from bs4 import BeautifulSoup

payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
                "dateFrom":"2020-02-01",
                "dateTo":"2020-02-05",
                "timeZone":"8",
                "timeFilter":"timeRemain",
                "currentTab":"custom",
                "limit_from":"0"}

urlheader = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}

url = "https://www.investing.com/economic-calendar/"

req = requests.post(url, data=payload, headers=urlheader)
print(req)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="economicCalendarData")

The table variable looks like this table variable

I can see that it sends a post request to "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData" whenever I change the date range or filter settings.

Here is the request data I found.

request data

Here is the POST link

post link

So I use the following code instead, as I want to select the dates.

import requests
import pandas as pd
from bs4 import BeautifulSoup

payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
                "dateFrom":"2020-02-01",
                "dateTo":"2020-02-05",
                "timeZone":"8",
                "timeFilter":"timeRemain",
                "currentTab":"custom",
                "limit_from":"0"}

urlheader = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}

url = "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData"

req = requests.post(url, data=payload, headers=urlheader)
print(req)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="economicCalendarData")

But this time, there is no economicCalendarData, so the table variable comes out empty. The soup variable has data in it but there's no table data in it.

This is the table I'm trying to save.

table to save

Like I said earlier, if I use the url as https://www.investing.com/economic-calendar/, I can get the table data for the current day only (4 Feb 2020); no matter what dates I enter into the payload (dateFrom, dateTo).

For some reason, the table comes up empty when I try to post to https://www.investing.com/economic-calendar/Service/getCalendarFilteredData instead, even though the soup variable contains data, it's not the data I request. What am I doing wrong? How do I save the tables on the dates I select?

Upvotes: 0

Views: 1749

Answers (1)

SIM
SIM

Reputation: 22440

You were real close. If I understood your requirements, the following should get you there:

import requests
from bs4 import BeautifulSoup

url = "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData"

payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
                "dateFrom":"2020-02-01",
                "dateTo":"2020-02-05",
                "timeZone":"8",
                "timeFilter":"timeRemain",
                "currentTab":"custom",
                "limit_from":"0"}

req = requests.post(url, data=payload, headers={
    "User-Agent":"Mozilla/5.0",
    "X-Requested-With": "XMLHttpRequest"
    })
soup = BeautifulSoup(req.json()['data'],"lxml")
for items in soup.select("tr"):
    data = [item.get_text(strip=True) for item in items.select("th,td")]
    print(data)

Upvotes: 2

Related Questions