Reputation: 19
I am looking to web scrape all the codes and the codes under each hierarchy as seen on the left panel from this website using R package rvest.
URL-- http://apps.who.int/classifications/icd10/browse/2016/en/
To begin with I tried this code-
url<-"http://apps.who.int/classifications/icd10/browse/2016/en/"
src<-read_html(url)
ATC<-src%>%html_node("a.ygtvlabel")%>%html_text
a.ygtvlbel is the class I see when hovering on the text in the web page.
However it just returns NA_character. I see that the html source for the page, does not directly contain these codes(Ex- parasitic diseases) but instead it probably has an href to all the contents.
How Can I go about scraping such a page. Kindly advise.
Upvotes: 0
Views: 571
Reputation: 4378
As with many of these kinds of pages, this page makes a background request for a json file that contains the data. This can be discovered by using browser debug tools and looking at the network requests. Using an API as noted in comments is a better choice
library(httr)
library(jsonlite)
## original url<-"http://apps.who.int/classifications/icd10/browse/2016/en/"
json_url <- "http://apps.who.int/classifications/icd10/browse/2016/en/JsonGetRootConcepts?useHtml=false"
json_data <- rawToChar(GET(json_url)$content)
categories <- fromJSON(json_data)
categories$label
# [1] "I Certain infectious and parasitic diseases"
# [2] "II Neoplasms"
# [3] "III Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism"
# [4] "IV Endocrine, nutritional and metabolic diseases"
# gories$label
Upvotes: 1