John
John

Reputation: 5239

R: How to replace elements of a data.frame?

I'm trying to replace elements of a data.frame containing "#N/A" with "NULL", and I'm running into problems:

foo <- data.frame("day"= c(1, 3, 5, 7), "od" = c(0.1, "#N/A", 0.4, 0.8))

indices_of_NAs <- which(foo == "#N/A") 

replace(foo, indices_of_NAs, "NULL")

Error in [<-.data.frame(*tmp*, list, value = "NULL") : new columns would leave holes after existing columns

I think that the problem is that my index is treating the data.frame as a vector, but that the replace function is treating it differently somehow, but I'm not sure what the issue is?

Upvotes: 14

Views: 73786

Answers (3)

Aashu
Aashu

Reputation: 131

Why not

x$col[is.na(x$col)]<-value

?
You wont have to change your dataframe

Upvotes: 12

mdsumner
mdsumner

Reputation: 221

NULL really means "nothing", not "missing" so it cannot take the place of an actual value - for missing R uses NA.

You can use the replacement method of is.na to directly update the selected elements, this will work with a logical result. (Using which for indices will only work with is.na, direct use of [ invokes list access, which is the cause of your error).

foo <- data.frame("day"= c(1, 3, 5, 7), "od" = c(0.1, "#N/A", 0.4, 0.8)) 
NAs <- foo == "#N/A"

## by replace method
is.na(foo)[NAs] <- TRUE

 ## or directly
 foo[NAs] <- NA

But, you are already dealing with strings (actually a factor by default) in your od column by forced coercion when it was created with c(), and you might need to treat columns individually. Any numeric column will never have a match on the string "#N/A", for example.

Upvotes: 19

Shane
Shane

Reputation: 100164

The replace function expects a vector and you're supplying a data.frame.

You should really try to use NA and NULL instead of the character values that you're currently using. Otherwise you won't be able to take advantage of all of R's functionality to handle missing values.

Edit

You could use an apply function, or do something like this:

foo <- data.frame(day= c(1, 3, 5, 7), od = c(0.1, NA, 0.4, 0.8))
idx <- which(is.na(foo), arr.ind=TRUE)
foo[idx[1], idx[2]] <- "NULL"

You cannot assign a real NULL value in this case, because it has length zero. It is important to understand the difference between NA and NULL, so I recommend that you read ?NA and ?NULL.

Upvotes: 1

Related Questions