Reputation: 41
> str(store)
'data.frame': 1115 obs. of 10 variables:
$ Store : int 1 2 3 4 5 6 7 8 9 10 ...
$ StoreType : Factor w/ 4 levels "a","b","c","d": 3 1 1 3 1 1 1 1 1 1 ...
$ Assortment : Factor w/ 3 levels "a","b","c": 1 1 1 3 1 1 3 1 3 1 ...
$ CompetitionDistance : int 1270 570 14130 620 29910 310 24000 7520 2030 3160 ...
$ CompetitionOpenSinceMonth: int 9 11 12 9 4 12 4 10 8 9 ...
$ CompetitionOpenSinceYear : int 2008 2007 2006 2009 2015 2013 2013 2014 2000 2009 ...
$ Promo2 : int 0 1 1 0 0 0 0 0 0 0 ...
$ Promo2SinceWeek : int NA 13 14 NA NA NA NA NA NA NA ...
$ Promo2SinceYear : int NA 2010 2011 NA NA NA NA NA NA NA ...
$ PromoInterval : Factor w/ 4 levels "","Feb,May,Aug,Nov",..: 1 3 3 1 1 1 1 1 1 1 ...
I'm trying to replace NA's depending on Promo2 value.If Promo2==0, NA values in that row need to be zero, else if Promo2==1 missing values should be replaced by column mean.
Don't understand why my code doesn't edit store data.
for (i in 1:nrow(store)){
if(is.na(store[i,])== TRUE & store$Promo2[i] ==0){
store[i,] <- ifelse(is.na(store[i,]),0,store[i,])
}
else if (is.na(store[i,])== TRUE & store$Promo2[i] ==1){
for(j in 1:ncol(store)){
store[is.na(store[i,j]), j] <- mean(store[,j], na.rm = TRUE)
}
}
}
Upvotes: 1
Views: 3144
Reputation: 28461
To fix the for loop:
for(i in 1:nrow(store)) {
col <- which(is.na(store[i,]))
store[i,][col] <- if(store$Promo2[i] == 1) colMeans(store[col], na.rm=TRUE) else 0
}
Or if you don't want any if statements:
for (i in 1:nrow(store)) {
store[i,][is.na(store[i,]) & store$Promo2[i] ==0] <- 0
store[i,][is.na(store[i,]) & store$Promo2[i] ==1] <-
colMeans(store[,is.na(store[i,]) & store$Promo2[i] ==1], na.rm = TRUE)
}
Your loop isn't working because if
statements accept one conditional value from the test. Your loop sends if(is.na(store[i,])== TRUE & store$Promo2[i] ==0)
to it. But that conditional statement will have many values TRUE FALSE FALSE FALSE TRUE...
. It is a series of trues and falses when it should only be one value only, either one TRUE or one FALSE. The function will take the first value only when you give it multiples.
Reproducible example
store
# Promo2 gear carb
#Mazda RX4 1 NA NA
#Mazda RX4 Wag 1 4 4
#Datsun 710 1 4 1
#Hornet 4 Drive 0 3 1
#Hornet Sportabout 0 3 NA
#Valiant 0 3 1
for(i in 1:nrow(store)) {
col <- which(is.na(store[i,]))
store[i,][col] <- if(store$Promo2[i] == 1) colMeans(store[col], na.rm=TRUE) else 0
}
store
# Promo2 gear carb
#Mazda RX4 1 3.4 1.75
#Mazda RX4 Wag 1 4.0 4.00
#Datsun 710 1 4.0 1.00
#Hornet 4 Drive 0 3.0 1.00
#Hornet Sportabout 0 3.0 0.00
#Valiant 0 3.0 1.00
Data
store <- head(mtcars)
store <- store[-(1:8)]
names(store)[1] <- "Promo2"
store[1,2] <- NA
store[5,3] <- NA
store[1,3] <- NA
store
Upvotes: 0
Reputation: 3710
For the Promo2SinceWeek column:
store$Promo2SinceWeek[store$Promo2==0 & is.na(store$Promo2SinceWeek)] <- 0
store$Promo2SinceWeek[store$Promo2==1 & is.na(store$Promo2SinceWeek)] <- mean(store$Promo2SinceWeek, na.rm=TRUE)
For other column, use the same approach. Vectorized functions are a very useful feature of R.
Upvotes: 4