user3614783
user3614783

Reputation: 841

Non-standard character prevents me from reading in full csv file in R

My csv file (accessible through link and viewable through screenshot) has 8 observations. Obs #5 has a non-standard character in the "author" column. I've shaded this yellow.

https://docs.google.com/spreadsheets/d/1-douIz03OQqahG6WCWY-irOE52oXtDDc4fJ6myMwJDk/edit?usp=sharing

enter image description here

When I run the following:

data1<-read.csv("Book1.csv",colClasses=c("end_date_n"="character","start_date_n"="character"),stringsAsFactors=FALSE)

I get this warning message and only the first 4 rows and a partial 5th row are imported. The import stops at the point where the non-standard character appears in col 5.

In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string

When I delete the "author" column from my csv source file, the import works fine.

How can I import the full file without having to delete the problem column?

Upvotes: 0

Views: 287

Answers (1)

user3614783
user3614783

Reputation: 841

A colleague came up with this solution:

"The original character is ^z, which for decades was used by DOS/Windows as an end of file marker. Because UNIX systems never used ^z, the read-in problem is Windows-specific. Windows systems often direct users to enter non-ASCII characters (like é) using “ALT” codes. This may be where the ^z originates."

"Use a utility to translate ^z to something innocuous. The killZ function below takes the name of a file, translates ^z to *, then write the results in the same directory as the original file but with a -noz inserted just before the .txt or .csv (or whatever) filetype. You can then read the -noz file in the same way you have been reading the original .txt or .csv file."

killZ <- function(fname) {
  # open in binary mode
  f <- file(fname, "rb")
  res <- readLines(f)
  # translate the ^Z to *
  res <- gsub("\032", "*", res, fixed = TRUE)
  # Create the new file name
  ftype <- stringr::str_extract(fname, "\\..{1,3}$")
  new_name <- paste0(gsub(ftype, "", fname), "-noz", ftype)
  writeLines(res, con = new_name)
  close(f)
  return(new_name)
}

Upvotes: 0

Related Questions