Reputation: 57
I am completely new to web scraping with R and I would like to scrape the following table (image) that behaves as tbody. If I run the following code, I see only headlines, without the data (Website in Czech).
I should be getting the time, price, volume and volume in CZK for placed orders there.
library(rvest)
library(dplyr)
PSE_Page <- "https://www.pse.cz/detail/CZ0003519753?tab=detail-trading-data"
Page <- read_html(PSE_Page)
Our_table <- Page %>%
rvest::html_nodes('body') %>%
xml2::xml_find_all("//div[contains(@class, 'stock-table large-table small-text page-block-negative-margin table-container js-swipe-icon')]") %>%
rvest::html_text()
Our_table
Output: 1 "\n Čas\n Cena\n Celkový objem\n Celkový objem\n **
Can somebody help? Thanks a lot!!!
Upvotes: 0
Views: 1049
Reputation: 109
library(tidyverse) library(rvest)
header <- html_elements(xpath='/html/body/div[1]/table') header \n {xml_nodeset (1)} <table><tr><td width="100%"> <div id="logo"> <table width="100%"><tr><td valign="top"> <a href="https://www.m "
space space space
header<- html_elements(xpath='/html/body/div[1]/table/tr')
header \n {xml_nodeset (1)} <tr/><td width="100%"> <div id="logo"> <table width="100%"><tr>\n<td valign="top"> <a href="https://www.maizegdb ."
Upvotes: 0
Reputation: 118
The table you're referring to is not a static table. It is dynamic, since you can iteract with it, e.g. sorting the table. So you can't scrape the information with rvest. I'm really no expert in dynamic web scraping, but this code snippet extracts the data. I use a web browser via the RSelenium package that can be controlled from within R to receive the dynamic content of that table. There are probably much better solutions out there to do this job, though.
library(RSelenium)
library(dplyr)
rD <- rsDriver(browser = "firefox", port = 8787L)
remDr <- rD$client
remDr$navigate("https://www.pse.cz/detail/CZ0003519753?tab=detail-trading-data")
page <- XML::htmlParse(remDr$getPageSource()[[1]])
remDr$close()
header <- XML::xpathSApply(page, "/html/body/div[8]/div[2]/div/div[2]/div[3]/div/div/table/thead", XML::xmlValue)
table <- XML::xpathSApply(page, "/html/body/div[8]/div[2]/div/div[2]/div[3]/div/div/table/tbody", XML::xmlValue)
header <- read.table(text=header, sep = "\n", strip.white = T) %>% unlist %>% as.character()
body <- read.table(text=table, sep = "\n", strip.white = T)
header[3] <- "Total Turnover pcs"
header[4] <- "Total Turnover CZK"
data.frame(lapply(split(body$V1, paste(header)), as.character))
# Price Time Total.Turnover.CZK Total.Turnover.pcs
# 1 95,00 % 12:00:25 CZK 780,333.33 800,000 pcs
# 2 95,00 % 12:00:08 CZK 292,625.00 300,000 pcs
# 3 95,00 % 12:00:08 CZK 195,083.33 200,000 pcs
Upvotes: 1