Reputation: 31
I have tried all the methods that I have found on stackoverflow regarding this topic and nothing worked.
Here is a sample of my dataset called TEST:
x2000 x2001 x2002
100 1200 230
200 2002 280
: 1980 :
":" represents a missing value. The problem is that I cannot replace this colon with R-accepted NA.
What I have tried:
sum(TEST %in c(":"))
returns: [1] 0
TEST[TEST==":"] <-NA #does nothing
I tried to save the file as .csv, replace the values with "NA" in excel and it still does nothing. The columns are not factors. if the column contains the value of ":" then the column is "chr" otherwise it is "int".
Upvotes: 0
Views: 1489
Reputation: 38520
Probably the easiest method is to set the na.strings method when reading in the data with one of the read.
family of functions. Here is an example with read.table
for your example data:
df <- read.table(header=T, text="x2000 x2001 x2002
100 1200 230
200 2002 280
: 1980 : ", na.strings=":")
This returns
df
x2000 x2001 x2002
1 100 1200 230
2 200 2002 280
3 NA 1980 NA
Perhaps more importantly, the structure of the resulting data.frame is vectors of integers:
str(df)
'data.frame': 3 obs. of 3 variables:
$ x2000: int 100 200 NA
$ x2001: int 1200 2002 1980
$ x2002: int 230 280 NA
Without this, you will end up with a mixture of integer vectors and factor variables, which complicates the cleaning process a bit.
Upvotes: 4