Reputation: 3
I am new to scraping and for a first task I decided to scrape this webpage: https://finstat.sk/databaza-financnych-udajov?EmployeeExact=False&RpvsInsert=False&Sort=assets&PerPage=20
Lower on the page there is a list that contains numeric informations that I would like to scrape. Would you please help me with that? I tried this code.
library('rvest')
url <- 'https://finstat.sk/databaza-financnych-udajov?EmployeeExact=False&RpvsInsert=False&Sort=assets&PerPage=20'
webpage <- read_html(url)
tabulka <- html_nodes(webpage, xpath='/html/body/div[5]/div/div[3]/div[4]/div[2]/div/div/div[3]/table/tbody/tr[1]') %>%
html_table() %>%
head(tabulka)
After I run this I get the error: length(n) == 1L is not TRUE
Upvotes: 0
Views: 75
Reputation: 783
Maybe this:
library(rvest)
library(tidyverse)
scrape_data <- function(x) {
page <- read_html(sprintf("https://finstat.sk/databaza-financnych-udajov?EmployeeExact=False&RpvsInsert=False&Sort=assets&Page=%s", x))
first_two_cols <- lapply(c("td.data-table-column-pinned", "td.hidden-xs"), function(x) page %>% html_nodes(x) %>% html_text(trim = T)) %>% data.frame()
remaining_cols <- lapply(3:7, function(x) page %>% html_nodes(sprintf(".nowrap:nth-child(%s)",x)) %>% html_text(trim = T)) %>% data.frame()
cbind(first_two_cols, remaining_cols) %>% set_names(paste0("var", 1:7))
}
#The following scrapes 5 pages, but the number can be adjusted:
df <- map_df(1:5, scrape_data)
Upvotes: 1