Jian Zhang
Jian Zhang

Reputation: 1217

Some troubles with Web scraping using R

I have some troubles to scrape the text information from this webpage: http://www.iplant.cn/info/Acer%20stachyophyllum?t=foc

What I need is the text information in the center of this webpage: "Trees to 15 m tall, dioecious. ..." I tried to use the read_html function in R package rvest, but got nothing. Could anyone help me with that? Thanks so much.

Upvotes: 0

Views: 53

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174586

This part of the page is generated from an xhr call. You can get the specific piece of text you are looking for from any species by doing:

get_description <- function(species_name)
{
  url   <- "http://www.iplant.cn/ashx/getfoc.ashx" 
  query <- paste0("?key=", gsub(" ", "+", species_name), 
                  "&key_no=&m=", runif(1), 9)
  jsonlite::fromJSON(paste0(url, query))$Description
}

So for example:

get_description("Actaea asiatica")
#> [1] "<p>Rhizome black-brown, with numerous slender fibrous roots. 
#> Stems 30--80 cm tall, terete, 4--6(--9) mm in diam., unbranched, 
#> basally glabrous, apically white pubescent. Leaves 2 or 3, proximal 
#> cauline leaves 3 × ternately pinnate ...<truncated>

get_description("Acer stachyophyllum")
# > [1] "<p>Trees to 15 m tall, dioecious. Bark yellowish brown, smooth.
#> Branchlets glabrous. Leaves deciduous; petiole 2.5-8 cm, slightly 
#> pubescent near apex; leaf blade ovate or oblong, 5-11 × 2.5-6 cm, 
#> undivided or 3-lobed, papery, abaxially densely pale or white pubescent,
#>  becoming less so when mature or nearly glabrous, adaxially glabrous,
#> 3-5-veined at base abaxially, rarely with rudimentary...<truncated>

Upvotes: 1

Related Questions