Reputation: 1116
This is probably an easy one but I seem unable to figure it out.
I have a csv-file with all entries wrapped in quotes, also numeric values, like this one, say xy.csv
:
"y","z"
"1.1","bla"
"2.1","blubb"
So far, I have been reading and re-declaring these files using
dat <- read.table("yz.csv",colClasses=rep("character",2), header=TRUE)
dat$y <- as.numeric(dat$y)
Now as the number of numeric columns increases, as in qz.csv
"q","r","s","t","u","v","w","x","y","z"
"1.1","1.2","1.3","1.4","1.5","1.6","1.7","1.8","1.9","bla"
"2.1","2.2","2.3","2.4","2.5","2.6","2.7","2.8","2.9","blubb"
I felt it is time to do this more professionally to prevent the following from happening
dat <- read.table("qz.csv",colClasses=rep("character",10), header=TRUE)
dat$q <- as.numeric(dat$a)
dat$r <- as.numeric(dat$b)
...
dat$y <- as.numeric(dat$y)
Is there a way to get the read.table function to ignore the quotes around the numbers, so I can use
dat <- read.table("qz.csv",colClasses=c(rep("numeric",9),"character"), header=TRUE)
which currently gives me the error that scan() expected 'a real', got '"1.2"'
?
Edit: Here is the original file and here is the code I use for the original that is giving me the error:
doc <- read.csv("testfile.csv", collClasses=c("character","character",rep("NULL",50),rep("numeric",7),"NULL","NULL"), col.names=c("country","code",rep("bla",50),"doc08","doc09","doc10","doc11","doc12","doc13","doc14","bla","bla"), skip=4, check.names=F, header=T)
Upvotes: 1
Views: 1294
Reputation: 23
I think the best bet is to use NA in colClasses for the numeric columns:
dat <- read.table("qz.csv",colClasses=c(rep(NA,9),"character"), header=TRUE)
Then type.convert() will recognize the numeric data and convert it. Alternatively, to reduce the clutter of converting columns after reading it, you can do something like
dat <- read.table("qz.csv",colClasses=rep("character",10), header=TRUE)
dat[,1:9] <- lapply(dat[,1:9], as.numeric)`
Upvotes: 0
Reputation: 6659
updated to show minimal example... I can't figure it out. I've tried the readr
package also...
a <- textConnection('"A", "B", "C"
"a", "1", "1"
"b", "2", "2"')
df <- read.csv(a, colClasses=c("character", "NULL", "numeric"),
col.names=c("AA", "BB", "CC"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'a real', got '"1"'
library("readr")
> df <- read_csv(a, col_types=c("c", "-", "n"),
+ col_names=c("AA", "BB", "CC"))
Error in names(spec$cols) <- col_names :
'names' attribute [3] must be the same length as the vector [2]
In addition: Warning message:
Insufficient `col_types`. Guessing 2 columns.
Upvotes: 2