Reputation: 2396
I'm trying to download a file and load into R, but It's not working. I'm on a MAC, using R 3.1.3
The file is in csv format (and there's an option of json format).
Here's the url for the file (csv and json): http://dadosabertos.dataprev.gov.br/opendata/con02/formato=csv http://dadosabertos.dataprev.gov.br/opendata/con02/formato=json
I know I can download the file, open in a local text editor, save as utf-8, and then import into R. But I'd like a more automated solution, that didn't involve using another software. And, btw, even this solution isn't working as easy as I expected.
Here's what I tried so far: Since the file is in Portuguese, I know it's probably utf-8.
library(jsonlite)
options(encoding = "utf-8")
url <- "http://dadosabertos.dataprev.gov.br/opendata/con02/formato=json"
prev <- fromJSON(url)
And the error message:
lexical error: invalid bytes in UTF8 string. :[{"node":{"Ano":"1988","Esp�cie":"42-Ap Tempo Contribui��o (right here) ------^
I also tried url1 <- "http://dadosabertos.dataprev.gov.br/opendata/con02/formato=csv" prev <- read.csv(url, sep=",")
But it also didn't work. I tried also to use:
Sys.setlocale("LC_ALL", 'en_US.UTF-8')
But it didn't make any difference.
Upvotes: 0
Views: 2046
Reputation: 931
I solved it by doing this way:
url<-"http://dadosabertos.dataprev.gov.br/opendata/act10/formato=json"
a<-readLines(file(url, encoding="ISO-8859-1"), warn=FALSE)
prev<-fromJSON(a)
Upvotes: 1
Reputation: 125998
At least the csv version appears to be in ISO-8859-1 rather than UTF-8. You can use the curl
command to check the Content-Type like this:
$ curl -I "http://dadosabertos.dataprev.gov.br/opendata/con02/formato=csv"
HTTP/1.1 200 OK
Set-Cookie: ACE_STICKY=R835601189; path=/; expires=Thu, 19-May-2016 00:43:56 GMT
Server: nginx/1.2.4
Date: Wed, 18 May 2016 00:27:45 GMT
Content-Type: text/plain; charset=ISO-8859-1
Connection: keep-alive
X-Powered-By: PHP/5.3.3
Content-Disposition: attachment; filename="CON02.csv";
Access-Control-Allow-Origin: *
And from looking at the contents, that appears to be correct. I'm not familiar with r's encoding options, but try setting `options(encoding = "ISO-8859-1") and see what happens.
Upvotes: 1