ceoec
ceoec

Reputation: 203

read csv setting fields with spaces to NA

I have a csv file that looks like this:

A, B,  C, 
1, 2 1, 3,
3, 1, 0, 
4, 1, 0 5,
 ...

is it possible to set the na.string to assign all fields with space to NA (e.g. something like regex function(x){x[grep(patt="\\ ", x)]<-NA;x}), i.e.

A, B, C,
1, NA, 3,
3, 1, 0,
4, 1, NA,

Upvotes: 1

Views: 67

Answers (3)

bytesinflight
bytesinflight

Reputation: 1694

Variation on @akrun's answer (which I like).

library(dplyr)
read.csv("test.csv", colClasses="character") %>% mutate_each(funs(as.numeric))

This reads the file assuming all columns are character, then converts all to numeric with mutate_each from dplyr.

Using colClasses="numeric" already in the read call didn't work (and I don't know why :( ), since

> as.numeric("2 1")
[1] NA

From How to read data when some numbers contain commas as thousand separator? we learn that we can make a new function to do the conversion.

setAs("character", "numwithspace", function(from) as.numeric(from) )
read.csv("test.csv", colClasses="numwithspace")

which gives

  A  B  C
1 1 NA  3
2 3  1  0
3 4  1 NA

Upvotes: 2

akrun
akrun

Reputation: 887501

We can loop over the columns and set it to NA by converting to numeric

df1[] <- lapply(df1, as.numeric)

NOTE: Here, I assumed that the columns are character class. If it is factor, do lapply(df1, function(x) as.numeric(as.character(x)))

Upvotes: 2

Aaron
Aaron

Reputation: 24812

I don't know how this would translate in , but I would use the following regex to match fields containing spaces :

[^, ]+ [^, ]+

Which is :

  • some characters other than a comma or a space ([^, ]+)
  • followed by a space ()
  • and some more characters other than a comma or a space ([^, ]+)

You can see it in action here.

Upvotes: 1

Related Questions