colin
colin

Reputation: 2666

reading a json file into R from the internet - trouble with </html> lines

I am trying to read the following JSON database into R using the jsonlite package.

library(jsonlite)
db <- fromJSON("http://www.stbates.org/funguild_db.php", flatten=TRUE)

Doing this throws the following error:

Error in parse_con(txt, bigint_as_char) : 
  lexical error: invalid char in json text.
                                       <html> <head> <title>funguild_d
                     (right here) ------^

Clearly it does not like these characters. Is there a simple work around here I am missing?

Upvotes: 1

Views: 670

Answers (1)

SymbolixAU
SymbolixAU

Reputation: 26258

@MrFlick is right in that it's not a good way to serve data. But as always, there's ways around it. Here I'm using rvest to scrape the entire page, then gsub to get rid of the first string, which happens to be the final part of the url (minus the .php extension).

url <- "http://www.stbates.org/funguild_db.php"

library(rvest)
library(jsonlite)

js <- url %>% 
    read_html() %>%
    html_text() 

js <- jsonlite::fromJSON(gsub("funguild_db", "", js))

head(js[, 1:5])

#                       $oid                  taxon taxonomicLevel trophicMode          guild
# 1 58f450f1791497fd28ebfccc Xanthomonas campestris             20  Pathotroph Plant Pathogen
# 2 58f450f1791497fd28ebfccd  Xanthomonas juglandis             20  Pathotroph Plant Pathogen
# 3 58f450f1791497fd28ebfcce         Xanthoparmelia             13 Symbiotroph     Lichenized
# 4 58f450f1791497fd28ebfccf           Xanthopeltis             13 Symbiotroph     Lichenized
# 5 58f450f1791497fd28ebfcd0            Xanthopsora             13 Symbiotroph     Lichenized
# 6 58f450f1791497fd28ebfcd1         Xanthopsorella             13 Symbiotroph     Lichenized

Upvotes: 2

Related Questions