Donnie Z
Donnie Z

Reputation: 41

Replacing NA with a value depending on the condition

> str(store)
'data.frame':   1115 obs. of  10 variables:
 $ Store                    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ StoreType                : Factor w/ 4 levels "a","b","c","d": 3 1 1 3 1 1 1 1 1 1 ...
 $ Assortment               : Factor w/ 3 levels "a","b","c": 1 1 1 3 1 1 3 1 3 1 ...
 $ CompetitionDistance      : int  1270 570 14130 620 29910 310 24000 7520 2030 3160 ...
 $ CompetitionOpenSinceMonth: int  9 11 12 9 4 12 4 10 8 9 ...
 $ CompetitionOpenSinceYear : int  2008 2007 2006 2009 2015 2013 2013 2014 2000 2009 ...
 $ Promo2                   : int  0 1 1 0 0 0 0 0 0 0 ...
 $ Promo2SinceWeek          : int  NA 13 14 NA NA NA NA NA NA NA ...
 $ Promo2SinceYear          : int  NA 2010 2011 NA NA NA NA NA NA NA ...
 $ PromoInterval            : Factor w/ 4 levels "","Feb,May,Aug,Nov",..: 1 3 3 1 1 1 1 1 1 1 ...

I'm trying to replace NA's depending on Promo2 value.If Promo2==0, NA values in that row need to be zero, else if Promo2==1 missing values should be replaced by column mean.

Don't understand why my code doesn't edit store data.

for (i in 1:nrow(store)){
  if(is.na(store[i,])== TRUE & store$Promo2[i] ==0){
    store[i,] <- ifelse(is.na(store[i,]),0,store[i,])
  }
  else if (is.na(store[i,])== TRUE & store$Promo2[i] ==1){
    for(j in 1:ncol(store)){
      store[is.na(store[i,j]), j] <- mean(store[,j], na.rm = TRUE)
    }
  }
}

Upvotes: 1

Views: 3144

Answers (2)

Pierre L
Pierre L

Reputation: 28461

To fix the for loop:

for(i in 1:nrow(store)) {
  col <- which(is.na(store[i,]))
  store[i,][col] <- if(store$Promo2[i] == 1) colMeans(store[col], na.rm=TRUE) else 0
}

Or if you don't want any if statements:

for (i in 1:nrow(store)) {

  store[i,][is.na(store[i,]) & store$Promo2[i] ==0] <- 0

  store[i,][is.na(store[i,]) & store$Promo2[i] ==1] <- 
       colMeans(store[,is.na(store[i,]) & store$Promo2[i] ==1], na.rm = TRUE)

}

Your loop isn't working because if statements accept one conditional value from the test. Your loop sends if(is.na(store[i,])== TRUE & store$Promo2[i] ==0) to it. But that conditional statement will have many values TRUE FALSE FALSE FALSE TRUE.... It is a series of trues and falses when it should only be one value only, either one TRUE or one FALSE. The function will take the first value only when you give it multiples.

Reproducible example

store
#                  Promo2 gear carb
#Mazda RX4              1   NA   NA
#Mazda RX4 Wag          1    4    4
#Datsun 710             1    4    1
#Hornet 4 Drive         0    3    1
#Hornet Sportabout      0    3   NA
#Valiant                0    3    1

    for(i in 1:nrow(store)) {
      col <- which(is.na(store[i,]))
      store[i,][col] <- if(store$Promo2[i] == 1) colMeans(store[col], na.rm=TRUE) else 0
    }

store
#                  Promo2 gear carb
#Mazda RX4              1  3.4 1.75
#Mazda RX4 Wag          1  4.0 4.00
#Datsun 710             1  4.0 1.00
#Hornet 4 Drive         0  3.0 1.00
#Hornet Sportabout      0  3.0 0.00
#Valiant                0  3.0 1.00

Data

store <- head(mtcars)
store <- store[-(1:8)]
names(store)[1] <- "Promo2"
store[1,2] <- NA
store[5,3] <- NA
store[1,3] <- NA
store

Upvotes: 0

Ven Yao
Ven Yao

Reputation: 3710

For the Promo2SinceWeek column:

store$Promo2SinceWeek[store$Promo2==0 & is.na(store$Promo2SinceWeek)] <- 0
store$Promo2SinceWeek[store$Promo2==1 & is.na(store$Promo2SinceWeek)] <- mean(store$Promo2SinceWeek, na.rm=TRUE)

For other column, use the same approach. Vectorized functions are a very useful feature of R.

Upvotes: 4

Related Questions