Reputation: 325
I am trying to automate the process of scraping the tables of webpages like Investing.com Economic Calendar which is fairly straightforward with R if we are only interested in the default tab, which displays the calendar for today. Here is the R code:
library(rvest)
library(dplyr)
Econ_webpage <- read_html("https://www.investing.com/economic-calendar/")
Indicators <- Econ_webpage %>% html_nodes("#economicCalendarData") %>%
html_table(fill = TRUE) %>% .[[1]] %>% .[-(1:3),- c(match("Imp.",colnames(.)),ncol(.))]
which produces the desired result displayed below.
> head(Indicators)
Time Cur. Event Actual Forecast Previous
4 19:50 JPY BoJ Summary of Opinions
5 19:50 JPY Exports (YoY) (Feb) 1.9% 12.3%
6 19:50 JPY Imports (YoY) (Feb) 17.1% 7.9%
7 19:50 JPY Trade Balance (Feb) -100B -944B
8 20:01 GBP Rightmove House Price Index (MoM) 0.8%
9 21:30 CNY House Prices (YoY) (Feb) 5.0%
However, if I want to scrape the table in the tab Tomorrow I need to use the Selenium driver. I have tried RSelenium, but can not get it to work on my machine, so I have tried Selenium in Python. I use the following code in Python:
import selenium
from selenium import webdriver
driver.Chrome(executable_path=PATH_TO_CHROMEDRIVER)
driver.get("https://www.investing.com/economic-calendar/")
driver.find_element_by_id("timeFrame_tomorrow").click()
html = driver.page_source
Now I have the html containing the desired table data within a string, which I simply don't know how to efficiently pars to produce the result of the R code. Can I somehow call rpy2 package, which allows for R code within Python or someone else knows an easier way to extract the table in the same form as above? How do I parse this html string?
Upvotes: 2
Views: 596
Reputation: 887971
With RSelenium
in R
we could try
library(RSelenium)
library(XML)
rD <- rsDriver()
remDr <- rD[["client"]]
remDr$navigate("https://www.investing.com/economic-calendar/")
option <- remDr$findElement("id", "timeFrame_tomorrow")
option$clickElement()
res <- readHTMLTable((remDr$getPageSource()[[1]]))$economicCalendarData
res <- res[-1,]
head(res)
# Time Cur. Imp. Event Actual Forecast Previous
#2 02:30 GBP Investing.com GBP/USD Index 46.5%
#3 02:30 USD Investing.com Gold Index 65.6%
#4 02:30 USD Investing.com S&P 500 Index 70.7%
#5 02:30 CAD Investing.com USD/CAD Index 41.8%
#6 02:30 CHF Investing.com USD/CHF Index 53.8%
#7 02:30 AUD Investing.com AUD/USD Index 47.9%
remDr$close()
rD[["server"]]$stop()
Upvotes: 2