john
john

Reputation: 1036

Read issue from JSON

I have xml file stored here. It contains json. I am unable to read and convert it to dataframe

Cleaning code

htext <- html_nodes(content, xpath=".//script[contains(., 'home_js_model')]") %>% html_text()
htext <- gsub("<script type=\"text/javascript\">", "", htext, fixed=TRUE)
htext <- gsub("var home_js_model = {", "", htext, fixed=TRUE)
htext <- gsub("</script>", "", htext, fixed=TRUE)
htext <- gsub("stock\":", "", htext, fixed=TRUE)

Read from JSON

json <- jsonlite::fromJSON(htext)

I tried this as well but didn't get success.

jsonlite::stream_in(textConnection(gsub("\\n", "", htext)))

Upvotes: 0

Views: 57

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174293

You are almost there. You need to trim off the var home_js_model = from the start and and the semicolon from the end to parse the json. However, the result is a very long, very complicated nested listed, so your parsing woes may just be starting...

jsonlite::fromJSON(substr(htext, 21, 5615711))
#> $stock
#> $stock$period_title
#> [1] "1T2018"
#>
#> $stock$total
#> [1] 4162848
#>
#> $stock$total_sale
#> [1] 3426559
#>
#> $stock$total_rental
#> [1] 736289
#>
#> $stock$total_es
#> [1] 2196851
#>
#> ... (very very long list)

Upvotes: 1

Related Questions