Reputation: 1036
I have xml file stored here. It contains json. I am unable to read and convert it to dataframe
Cleaning code
htext <- html_nodes(content, xpath=".//script[contains(., 'home_js_model')]") %>% html_text()
htext <- gsub("<script type=\"text/javascript\">", "", htext, fixed=TRUE)
htext <- gsub("var home_js_model = {", "", htext, fixed=TRUE)
htext <- gsub("</script>", "", htext, fixed=TRUE)
htext <- gsub("stock\":", "", htext, fixed=TRUE)
Read from JSON
json <- jsonlite::fromJSON(htext)
I tried this as well but didn't get success.
jsonlite::stream_in(textConnection(gsub("\\n", "", htext)))
Upvotes: 0
Views: 57
Reputation: 174293
You are almost there. You need to trim off the var home_js_model =
from the start and and the semicolon from the end to parse the json. However, the result is a very long, very complicated nested listed, so your parsing woes may just be starting...
jsonlite::fromJSON(substr(htext, 21, 5615711))
#> $stock
#> $stock$period_title
#> [1] "1T2018"
#>
#> $stock$total
#> [1] 4162848
#>
#> $stock$total_sale
#> [1] 3426559
#>
#> $stock$total_rental
#> [1] 736289
#>
#> $stock$total_es
#> [1] 2196851
#>
#> ... (very very long list)
Upvotes: 1