user3015289
user3015289

Reputation: 31

R. replacing null value representation with NA

I have tried all the methods that I have found on stackoverflow regarding this topic and nothing worked.

Here is a sample of my dataset called TEST:

x2000 x2001 x2002

100   1200   230
200   2002   280
:     1980   :  

":" represents a missing value. The problem is that I cannot replace this colon with R-accepted NA.

What I have tried:

sum(TEST %in c(":"))
returns: [1] 0

TEST[TEST==":"] <-NA #does nothing

I tried to save the file as .csv, replace the values with "NA" in excel and it still does nothing. The columns are not factors. if the column contains the value of ":" then the column is "chr" otherwise it is "int".

Upvotes: 0

Views: 1489

Answers (1)

lmo
lmo

Reputation: 38520

Probably the easiest method is to set the na.strings method when reading in the data with one of the read. family of functions. Here is an example with read.table for your example data:

df <- read.table(header=T, text="x2000 x2001 x2002
100   1200   230
200   2002   280
:     1980   :  ", na.strings=":")

This returns

df
  x2000 x2001 x2002
1   100  1200   230
2   200  2002   280
3    NA  1980    NA

Perhaps more importantly, the structure of the resulting data.frame is vectors of integers:

str(df)
'data.frame':   3 obs. of  3 variables:
 $ x2000: int  100 200 NA
 $ x2001: int  1200 2002 1980
 $ x2002: int  230 280 NA

Without this, you will end up with a mixture of integer vectors and factor variables, which complicates the cleaning process a bit.

Upvotes: 4

Related Questions