Qbik
Qbik

Reputation: 6157

Reading file containing numerical values and unknown errors (random strings) in R

Some .csv files with numerical data I work with contain errors, each error is marked as random string, for example after reading in, data frame could look like that :

set.seed(123)
rand.str <-  paste0(letters[sample(10)], collapse="")
wrong.output <- data.frame(a=1:5, b=c(4:5, rand.str, 7:8), stringsAsFactors=FALSE)

in this case proper output is :

proper.output <- data.frame(a=1:5, b=c(4:5, NA, 7:8))

after reading with read.csv each column with at least one character value is treated as character column.

Can I mark errors (random strings) as NAs while reading-in file? If not, what is the most convenient, proper or fastest method for subsetting them with NA's ?

There is na.strings argument in read.csv, but it is the solution only in simpler cases where it can be used like: na.strings=c("-", "unavailable")

(can't see any duplicate, so I guess there is simple, workaround)

colClasses suggestion does not work

read.csv("test.txt", sep=",", colClasses = c("numeric", "numeric"))

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'a real', got 'chdgfajibe' In addition: Warning message: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'test.txt'

Upvotes: 1

Views: 83

Answers (2)

user5249203
user5249203

Reputation: 4648

I adapted this solution from a different solution for csv reading which is 7 years back. I thought it is a cleaner solution. It gives your desired output.

setClass("Alpha")
# replacing words with empty characters
setAs("character", "Alpha", 
      function(from) as.numeric(gsub('[[:alpha:]]+', '', from) ) )
read.csv('data.csv', colClasses = c('numeric','Alpha'))

output

  a  b
1 1  4
2 2  5
3 3 NA
4 4  7
5 5  8

Source: How to read data when some numbers contain commas as thousand separator?

Upvotes: 1

Qbik
Qbik

Reputation: 6157

solution is :

wrong.output[] <- lapply(wrong.output, as.numeric)

Upvotes: 0

Related Questions