Reputation: 43
I'm trying to scrape a table from https://data.oecd.org/unemp/unemployment-rate.htm and my table in specific https://data.oecd.org/chart/66NJ. I want to scrape the months at the top and all the values in the rows 'OECD - Total' and 'The Netherlands'
After trying many different code and searching on this and other forums I just can't figure out how to scrape from this table. I have tried many different html codes found via selector gadget or inspecting an element in my browser but keep getting 'list of 0' or 'character empty'
Any help would be appreciated.
library(tidyverse)
library(rvest)
library(XML)
library(magrittr)
#Get element data from one page
url<-"https://stats.oecd.org/sdmx-json/data/DP_LIVE/.HUR.TOT.PC_LF.M/OECD?json-lang=en&dimensionAtObservation=allDimensions&startPeriod=2016-08&endPeriod=2020-07"
#scrape all elements
content <- read_html(url)
#trying to load in a table (giveslist of 0)
inladentable <- readHTMLTable(url)
#gather al months (gives charahter 'empty')
months <- content %>%
html_nodes(".table-chart-sort-link") %>%
html_table()
#alle waarden voor de rij 'OECD - Total' verzamelen
wwpercentage<- content %>%
html_nodes(".table-chart-has-status-e") %>%
html_text()
# Combine into a tibble
wwtable <- tibble(months=months,wwpercentage=wwpercentage)
Upvotes: 4
Views: 317
Reputation: 41260
This is JSON
and not HTML
.
You can query it using httr
and jsonlite
:
library(httr)
res <- GET("https://stats.oecd.org/sdmx-json/data/DP_LIVE/.HUR.TOT.PC_LF.M/OECD?json-lang=en&dimensionAtObservation=allDimensions&startPeriod=2016-08&endPeriod=2020-07")
res <- jsonlite::fromJSON(content(res,as='text'))
res
#> $header
#> $header$id
#> [1] "98b762f3-47aa-4e28-978a-a4a6f6b3995a"
#>
#> $header$test
#> [1] FALSE
#>
#> $header$prepared
#> [1] "2020-09-30T21:58:10.5763805Z"
#>
#> $header$sender
#> $header$sender$id
#> [1] "OECD"
#>
#> $header$sender$name
#> [1] "Organisation for Economic Co-operation and Development"
#>
#>
#> $header$links
#> href
#> 1 https://stats.oecd.org:443/sdmx-json/data/DP_LIVE/.HUR.TOT.PC_LF.M/OECD?json-lang=en&dimensionAtObservation=allDimensions&startPeriod=2016-08&endPeriod=2020-07
#> rel
#> 1 request
#>
#>
#> $dataSets
#> action observations.0:0:0:0:0:0 observations.0:0:0:0:0:1
#> 1 Information 5.600849, 0.000000, NA 5.645914, 0.000000, NA
...
Upvotes: 2