Mitchziie
Mitchziie

Reputation: 43

Scraping a table from OECD

I'm trying to scrape a table from https://data.oecd.org/unemp/unemployment-rate.htm and my table in specific https://data.oecd.org/chart/66NJ. I want to scrape the months at the top and all the values in the rows 'OECD - Total' and 'The Netherlands'

After trying many different code and searching on this and other forums I just can't figure out how to scrape from this table. I have tried many different html codes found via selector gadget or inspecting an element in my browser but keep getting 'list of 0' or 'character empty'

Any help would be appreciated.

library(tidyverse)
library(rvest)
library(XML)
library(magrittr)

#Get element data from one page
url<-"https://stats.oecd.org/sdmx-json/data/DP_LIVE/.HUR.TOT.PC_LF.M/OECD?json-lang=en&dimensionAtObservation=allDimensions&startPeriod=2016-08&endPeriod=2020-07"
  
#scrape all elements
content <- read_html(url)

#trying to load in a table (giveslist of 0)
inladentable <- readHTMLTable(url)

#gather al months (gives charahter 'empty')
months <- content %>% 
  html_nodes(".table-chart-sort-link") %>%
  html_table()
  
#alle waarden voor de rij 'OECD - Total' verzamelen
wwpercentage<- content %>% 
  html_nodes(".table-chart-has-status-e") %>%
  html_text()
  
# Combine into a tibble
wwtable <- tibble(months=months,wwpercentage=wwpercentage)

Upvotes: 4

Views: 317

Answers (1)

Waldi
Waldi

Reputation: 41260

This is JSON and not HTML.
You can query it using httr and jsonlite:

library(httr)
res <- GET("https://stats.oecd.org/sdmx-json/data/DP_LIVE/.HUR.TOT.PC_LF.M/OECD?json-lang=en&dimensionAtObservation=allDimensions&startPeriod=2016-08&endPeriod=2020-07")
res <- jsonlite::fromJSON(content(res,as='text'))
res 

#> $header
#> $header$id
#> [1] "98b762f3-47aa-4e28-978a-a4a6f6b3995a"
#> 
#> $header$test
#> [1] FALSE
#> 
#> $header$prepared
#> [1] "2020-09-30T21:58:10.5763805Z"
#> 
#> $header$sender
#> $header$sender$id
#> [1] "OECD"
#> 
#> $header$sender$name
#> [1] "Organisation for Economic Co-operation and Development"
#> 
#> 
#> $header$links
#>                                                                                                                                                              href
#> 1 https://stats.oecd.org:443/sdmx-json/data/DP_LIVE/.HUR.TOT.PC_LF.M/OECD?json-lang=en&dimensionAtObservation=allDimensions&startPeriod=2016-08&endPeriod=2020-07
#>       rel
#> 1 request
#> 
#> 
#> $dataSets
#>        action observations.0:0:0:0:0:0 observations.0:0:0:0:0:1
#> 1 Information   5.600849, 0.000000, NA   5.645914, 0.000000, NA
...

Upvotes: 2

Related Questions