Tomas
Tomas

Reputation: 3

R scraping xpath

I am new to scraping and for a first task I decided to scrape this webpage: https://finstat.sk/databaza-financnych-udajov?EmployeeExact=False&RpvsInsert=False&Sort=assets&PerPage=20

Lower on the page there is a list that contains numeric informations that I would like to scrape. Would you please help me with that? I tried this code.

library('rvest')


url <- 'https://finstat.sk/databaza-financnych-udajov?EmployeeExact=False&RpvsInsert=False&Sort=assets&PerPage=20'

webpage <- read_html(url)

tabulka <- html_nodes(webpage, xpath='/html/body/div[5]/div/div[3]/div[4]/div[2]/div/div/div[3]/table/tbody/tr[1]') %>%
    html_table() %>%

head(tabulka)

After I run this I get the error: length(n) == 1L is not TRUE

Output needed

Upvotes: 0

Views: 75

Answers (1)

udden2903
udden2903

Reputation: 783

Maybe this:

library(rvest)
library(tidyverse)

scrape_data <- function(x) {
  page <- read_html(sprintf("https://finstat.sk/databaza-financnych-udajov?EmployeeExact=False&RpvsInsert=False&Sort=assets&Page=%s", x))
  first_two_cols <- lapply(c("td.data-table-column-pinned", "td.hidden-xs"), function(x) page %>% html_nodes(x) %>% html_text(trim = T)) %>% data.frame()
  remaining_cols <- lapply(3:7, function(x) page %>% html_nodes(sprintf(".nowrap:nth-child(%s)",x)) %>% html_text(trim = T)) %>% data.frame()
  cbind(first_two_cols, remaining_cols) %>% set_names(paste0("var", 1:7))
}

#The following scrapes 5 pages, but the number can be adjusted:
df <- map_df(1:5, scrape_data)

Upvotes: 1

Related Questions