Reputation: 45
I'm running into an unexpected challenge in R. In my dataset, there are NA in certain columns. Some of these NAs SHOULD be present (the values are truly missing), while others should be replaced with 0s. I used code like the following:
df1 <- data.frame(x = c(1, 2, 3, 4, 5), y = c(10, 10, NA, NA, 12), z = c(9, 9, 9, 9, 9))
for (i in nrow(df1)){
if(df1$x[i] > 3){
df1$y[i] = 0
df1$z[i] = 0
}
}
And obtained this output
x y z
1 1 10 9
2 2 10 9
3 3 NA 9
4 4 NA 9
5 5 0 0
The NA SHOULD be preserved in row 3, but the NA in row 4 should have been replaced with 0. Further, the z value in row 4 did not update. Any ideas as to what is happening?
Upvotes: 2
Views: 69
Reputation: 4520
Don't do it this way, R isn't Python, you get your vectorized functions out of the box:
df1[df1$x > 3, c('y', 'z')] <- 0
df1
# x y z
# 1 1 10 9
# 2 2 10 9
# 3 3 NA 9
# 4 4 0 0
# 5 5 0 0
Upvotes: 1
Reputation: 632
You've used for i in nrow(df1)
which evaluates to for i in 5
. I'm guessing you meant to use for i in 1:nrow(df1)
, which would evaluate to for i in 1:5
and include all rows.
Upvotes: 2