Manoel Galdino
Manoel Galdino

Reputation: 2396

Encoding problems on MAC

I'm trying to download a file and load into R, but It's not working. I'm on a MAC, using R 3.1.3

The file is in csv format (and there's an option of json format).

Here's the url for the file (csv and json): http://dadosabertos.dataprev.gov.br/opendata/con02/formato=csv http://dadosabertos.dataprev.gov.br/opendata/con02/formato=json

I know I can download the file, open in a local text editor, save as utf-8, and then import into R. But I'd like a more automated solution, that didn't involve using another software. And, btw, even this solution isn't working as easy as I expected.

Here's what I tried so far: Since the file is in Portuguese, I know it's probably utf-8.

library(jsonlite)
options(encoding = "utf-8")
url <- "http://dadosabertos.dataprev.gov.br/opendata/con02/formato=json"
prev <- fromJSON(url)

And the error message:

lexical error: invalid bytes in UTF8 string. :[{"node":{"Ano":"1988","Esp�cie":"42-Ap Tempo Contribui��o (right here) ------^

I also tried url1 <- "http://dadosabertos.dataprev.gov.br/opendata/con02/formato=csv" prev <- read.csv(url, sep=",")

But it also didn't work. I tried also to use:

Sys.setlocale("LC_ALL", 'en_US.UTF-8')

But it didn't make any difference.

Upvotes: 0

Views: 2046

Answers (2)

Jos&#233;
Jos&#233;

Reputation: 931

I solved it by doing this way:

url<-"http://dadosabertos.dataprev.gov.br/opendata/act10/formato=json"
a<-readLines(file(url, encoding="ISO-8859-1"), warn=FALSE)
prev<-fromJSON(a)

Upvotes: 1

Gordon Davisson
Gordon Davisson

Reputation: 125998

At least the csv version appears to be in ISO-8859-1 rather than UTF-8. You can use the curl command to check the Content-Type like this:

$ curl -I "http://dadosabertos.dataprev.gov.br/opendata/con02/formato=csv"
HTTP/1.1 200 OK
Set-Cookie: ACE_STICKY=R835601189; path=/; expires=Thu, 19-May-2016 00:43:56 GMT
Server: nginx/1.2.4
Date: Wed, 18 May 2016 00:27:45 GMT
Content-Type: text/plain; charset=ISO-8859-1
Connection: keep-alive
X-Powered-By: PHP/5.3.3
Content-Disposition: attachment; filename="CON02.csv";
Access-Control-Allow-Origin: *

And from looking at the contents, that appears to be correct. I'm not familiar with r's encoding options, but try setting `options(encoding = "ISO-8859-1") and see what happens.

Upvotes: 1

Related Questions