gregmacfarlane
gregmacfarlane

Reputation: 2283

R: replace NA with item from vector

I am trying to replace some missing values in my data with the average values from a similar group.

My data looks like this:

   X   Y
1  x   y
2  x   y
3  NA  y
4  x   y

And I want it to look like this:

  X   Y
1  x   y
2  x   y
3  y   y
4  x   y

I wrote this, and it worked

for(i in 1:nrow(data.frame){
   if( is.na(data.frame$X[i]) == TRUE){
       data.frame$X[i] <- data.frame$Y[i]
   }
  }

But my data.frame is almost half a million lines long, and the for/if statements are pretty slow. What I want is something like

is.na(data.frame$X) <- data.frame$Y

But this gets a mismatched size error. It seems like there should be a command that does this, but I cannot find it here on SO or on the R help list. Any ideas?

Upvotes: 10

Views: 17294

Answers (4)

Olsgaard
Olsgaard

Reputation: 1582

If you are already using dplyr or tidyverse, you can use the coalesce function to do exactly this.

> df <- data.frame(X=c("x", "x", NA, "x"), Y=rep("y",4), stringsAsFactors=FALSE)
> df %>% mutate(X = coalesce(X, Y))
  X Y
1 x y
2 x y
3 y y
4 x y```

Upvotes: 1

RndmSymbl
RndmSymbl

Reputation: 553

Unfortunately I cannot comment, yet, but while vectorizing some code where strings aka characters were involved the above seemd to not work. The reason being explained in this answer. If characters are involved stringsAsFactors=FALSE is not enough because R might already have created factors out of characters. One needs to ensure that the data also becomes a character vector again, e.g., data.frame(X=as.character(c("x", "x", NA, "x")), Y=as.character(rep("y",4)), stringsAsFactors=FALSE)

Upvotes: 0

Richie Cotton
Richie Cotton

Reputation: 121077

ifelse is your friend.

Using Dirk's dataset

df <- within(df, X <- ifelse(is.na(X), Y, X))

Upvotes: 12

Dirk is no longer here
Dirk is no longer here

Reputation: 368261

Just vectorise it -- the boolean index test is one expression, and you can use that in the assignment too.

Setting up the data:

R> df <- data.frame(X=c("x", "x", NA, "x"), Y=rep("y",4), stringsAsFactors=FALSE)
R> df
     X Y
1    x y
2    x y
3 <NA> y
4    x y

And then proceed by computing an index of where to replace, and replace:

R> ind <- which( is.na( df$X ) )
R> df[ind, "X"] <- df[ind, "Y"]

which yields the desired outcome:

R> df
  X Y
1 x y
2 x y
3 y y
4 x y
R> 

Upvotes: 9

Related Questions