Reputation: 117
I am trying to scrape a table in a tab using rvest but html_elements()
seems to ignore it.
library(tidyverse)
library(rvest)
URL.BNPF <- 'https://fundkis.com/en/funds/bnppf-privatesustainable-balanced/BE6294262298#navs'
html <- read_html(URL.BNPF)
test <- html %>%
html_elements('#navs') %>%
html_elements('.row')
Code works until there, but anything I tried after to extract the table itself (under the title "Historical Net Asset Values (Quotes)") returns an empty list. I suspect the fact the table is under a tab may impact the class
or id
I need to look for.
Many thanks in advance.
Upvotes: 1
Views: 323
Reputation: 84465
Data is dynamically pulled from an API call. You can pick up the key from the initial url and pass that into the API call and get the data back as json. I altered the pageSize param of the API call to be large enough to get all likely results.
library(rvest)
library(stringr)
library(jsonlite)
r <- read_html('https://fundkis.com/en/funds/bnppf-privatesustainable-balanced/BE6294262298#navs')
share_id <- stringr::str_match(r %>% html_node('[name=ReactRiskPart]') %>% html_attr('props'), '"shareId": "(.*?)"')[,2]
api_url <- sprintf('https://fundkis.com/api/fkdb/navs/%s?PageIndex=0&PageSize=2000', share_id)
data<- jsonlite::read_json(api_url)
As dataframe:
library(rvest)
library(stringr)
library(jsonlite)
library(tidyverse)
library(purrr)
r <- read_html("https://fundkis.com/en/funds/bnppf-privatesustainable-balanced/BE6294262298#navs")
share_id <- stringr::str_match(r %>% html_node("[name=ReactRiskPart]") %>% html_attr("props"), '"shareId": "(.*?)"')[, 2]
api_url <- sprintf("https://fundkis.com/api/fkdb/navs/%s?PageIndex=0&PageSize=2000", share_id)
data <- jsonlite::read_json(api_url)
df <- map_dfr(data, data.frame) %>%
mutate(Date = format.Date(NavDate)) %>%
select(-c("FundShareId", "NavDate")) %>%
rename(Currency = NavCurrencyISO, `Net Asset` = TotalAum, VL = Nav, `Nb Shares` = NbShares) %>%
relocate(Date, Currency, VL, `Nb Shares`, `Net Asset`)
Upvotes: 1