LoveMYMAth
LoveMYMAth

Reputation: 111

Reading in a Table Using rvest

This is a link to a table with a table of ~290 Vine Plant names:

https://www.forestryimages.org/browse/catsubject.cfm?cat=51

I am trying to read in the table and keep the Common Names column. I have tried doing this with with the rvest library like so:

vine_web <- "https://www.forestryimages.org/browse/catsubject.cfm?cat=51"
vine_names <- vine_web %>%
  read_html() %>%
  html_table()

It reads the column names, but not the contents of the table. I have tried several reiterations using html_nodes, html_element, copying the css selector, and even the xpath.

I always end up with this as a result:

[[1]]
# A tibble: 1 x 4
  `Subject Number` `Common Name` `Scientific Name` `Number Of Images`
  <lgl>            <lgl>         <lgl>             <lgl>             
1 NA               NA            NA                NA                

The table is in a dynamic format, which leads me to believe that html_table() may need to be altered or may be the inappropriate function to use here. I would like to know if there is a way to read this table into R.

Upvotes: 1

Views: 79

Answers (1)

dcsuka
dcsuka

Reputation: 2997

It appears that you need JavaScript to scrape that table, but there is a workaround to download the data in JSON form. If you inspect and go to the network tab, there is an API that you can request for the JSON format of the table. Let me know if this answers your question.

library(jsonlite)
json_data <- fromJSON("https://api.bugwood.org/rest/api/subject/.json?fmt=datatable&include=count&cat=51&systemid=2&draw=2&columns%5B0%5D%5Bdata%5D=0&columns%5B0%5D%5Bsearchable%5D=false&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bdata%5D=1&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=true&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B2%5D%5Bdata%5D=2&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=true&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B3%5D%5Bdata%5D=3&columns%5B3%5D%5Bsearchable%5D=false&columns%5B3%5D%5Borderable%5D=true&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D=&order%5B0%5D%5Bcolumn%5D=1&order%5B0%5D%5Bdir%5D=asc&start=163&length=126&search%5Bvalue%5D=&_=1657572710039")
result <- as.data.frame(json_data$data)
colnames(result) <- json_data$columns

Upvotes: 4

Related Questions