user3612816
user3612816

Reputation: 325

Read dynamic webpage html in either Python or R

I am trying to automate the process of scraping the tables of webpages like Investing.com Economic Calendar which is fairly straightforward with R if we are only interested in the default tab, which displays the calendar for today. Here is the R code:

library(rvest)
library(dplyr)

Econ_webpage <- read_html("https://www.investing.com/economic-calendar/")

Indicators  <- Econ_webpage %>% html_nodes("#economicCalendarData") %>% 
html_table(fill = TRUE)  %>% .[[1]] %>% .[-(1:3),-  c(match("Imp.",colnames(.)),ncol(.))]

which produces the desired result displayed below.

> head(Indicators)
   Time Cur.                             Event Actual Forecast Previous 
4 19:50  JPY           BoJ Summary of Opinions                          
5 19:50  JPY              Exports (YoY)  (Feb)            1.9%    12.3% 
6 19:50  JPY              Imports (YoY)  (Feb)           17.1%     7.9% 
7 19:50  JPY              Trade Balance  (Feb)           -100B    -944B 
8 20:01  GBP Rightmove House Price Index (MoM)                     0.8% 
9 21:30  CNY         House Prices (YoY)  (Feb)                     5.0%

However, if I want to scrape the table in the tab Tomorrow I need to use the Selenium driver. I have tried RSelenium, but can not get it to work on my machine, so I have tried Selenium in Python. I use the following code in Python:

import selenium
from selenium import webdriver 

driver.Chrome(executable_path=PATH_TO_CHROMEDRIVER)
driver.get("https://www.investing.com/economic-calendar/")
driver.find_element_by_id("timeFrame_tomorrow").click()
html = driver.page_source

Now I have the html containing the desired table data within a string, which I simply don't know how to efficiently pars to produce the result of the R code. Can I somehow call rpy2 package, which allows for R code within Python or someone else knows an easier way to extract the table in the same form as above? How do I parse this html string?

Upvotes: 2

Views: 596

Answers (1)

akrun
akrun

Reputation: 887971

With RSelenium in R we could try

library(RSelenium)
library(XML)

rD <- rsDriver()
remDr <- rD[["client"]]
remDr$navigate("https://www.investing.com/economic-calendar/")
option <- remDr$findElement("id", "timeFrame_tomorrow")
option$clickElement()
res <- readHTMLTable((remDr$getPageSource()[[1]]))$economicCalendarData
res <- res[-1,]
head(res)
#   Time Cur. Imp.                       Event Actual Forecast Previous 
#2 02:30  GBP      Investing.com GBP/USD Index                    46.5% 
#3 02:30  USD         Investing.com Gold Index                    65.6% 
#4 02:30  USD      Investing.com S&P 500 Index                    70.7% 
#5 02:30  CAD      Investing.com USD/CAD Index                    41.8% 
#6 02:30  CHF      Investing.com USD/CHF Index                    53.8% 
#7 02:30  AUD      Investing.com AUD/USD Index                    47.9% 


remDr$close()
rD[["server"]]$stop() 

Upvotes: 2

Related Questions