Geiser
Geiser

Reputation: 1089

Encoding(?) error while trying to read a CSV file in R

I'm opening a CSV file in Ubuntu 14.04 LTS with LibreOffice and gedit, and in both apps I can see the file OK.

I've tried in a lot of ways to read that file into R using read.csv() but it is replacing the blank spaces " " for dots ".", among other strange things with characters different from letters.

I've tried with

codepages <- setNames(iconvlist(), iconvlist())
x <- lapply(codepages, function(enc) try(read.csv("ticket.csv", fileEncoding=enc)))

But it is always failing no matter which encoding I insert.

EDIT:

Here I send you the file

https://drive.google.com/open?id=0B1P26eyiBDcNWWR0OGJwU0E4V00

Upvotes: 0

Views: 190

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78832

(in a linux shell)

$ enca -f -L none ticket_import_template.csv
7bit ASCII characters

$ file ticket_import_template.csv
ticket_import_template.csv: ASCII text, with very long lines

Looks like plain ASCII. In R:

dplyr::glimpse(read.csv("ticket_import_template.csv", check.names=FALSE))

## Observations: 1
## Variables:
## $ Ticket Number                             (lgl) NA
## $ *Open Date                                (fctr) 01/08/15 11:00
## $ 1st Response Date                         (lgl) NA
## $ Due Date                                  (lgl) NA
## $ Close Date                                (lgl) NA
## $ Status Type                               (lgl) NA
## $ Client (user name)                        (int) 950000
## $ Localizaci?n                              (lgl) NA
## $ *Request Type (semicolon delimited)       (fctr) Instalaciones
## $ Priority Type                             (lgl) NA
## $ Subject                                   (lgl) NA
## $ *Request Detail                           (fctr) PROBANDO
## $ Tech Username                             (lgl) NA
## $ Recurso Numbers                           (lgl) NA
## $ Notes                                     (lgl) NA
## $ Room                                      (lgl) NA
## $ Department                                (lgl) NA
## $ Tecnico asignado                          (lgl) NA
## $ Delete? (Y/N)                             (lgl) NA
## $ NOTE: * = Field required for new records. (lgl) NA

You can also just use:

  • readr::read_csv("ticket_import_template.csv")
  • rio::import("ticket_import_template.csv")
  • data.table::fread("ticket_import_template.csv")

as well since their defaults are to leave the spaces (and more) in header column names.

read.csv adding sane .'s to the column headers since they have spaces in them. You'll actually end up regretting either spaced (as you'll get with check.names == FALSE) or dotted column names that long. The first thing you should probably do is rename the columns.

Upvotes: 2

Related Questions