Fomite
Fomite

Reputation: 2273

R appending characters to a CSV file on import

I have a file, found here

A very odd error has come up that I can sometimes reproduce, but can't figure out. When imported into R, occasionally some special characters will be appended to the 'number' column, meaning data$number no longer makes sense.

For example when running:

library(readr)
mers3 <- read_csv("~/Documents/Code/AnalysisInEpi/Week 3 - Binomial Regression/PS3/mers3.csv")

The resulting output is

Parsed with column specification:
cols(
  'number' = col_integer(),
)

When the actual name of the column is number with no quote marks. On my machine, this goes away when I use the base R read.csv() function, but on another users machine it persists, with a different set of special characters. I've opened the file on two machines now using text editors and can't see any encoding errors, etc. The original file was created via export in Excel.

Does anyone know what might be happening?

As an update, it appears opening and resaving the files in XCode might have fixed things, though the same cannot be said for BBEdit.

Upvotes: 2

Views: 85

Answers (1)

ruaridhw
ruaridhw

Reputation: 2345

Row 1080 of the file contains

?840,67,NA,1,0,0,1,0,1,0

readr is the only package to complain (correctly) about the presence of the "?" where there should be a number.

As for the output you've provided, this is typical of the readr package and is a message rather than an error. It's informing you of the column types it interpreted so if you want it to disappear, you can specify the column types yourself.

Otherwise, read.csv() and data.table::fread() will load your CSV as-is without trouble.

Upvotes: 4

Related Questions