LearneR
LearneR

Reputation: 2531

Where is my 'if-else' block going wrong?

I've a dataframe df file with the following data:

ID      P1  P2  Year    Month     A      B
11084   23  43  2001    April     41.9  -99.99
67985   76  12  2001    May       6.9   -9.99
11084   34  64  2001    June      -999  -99.99
34084   56  77  2001    July      NA    -99.99
11043   90  54  2001    August    NA    -99.99
23084   55  32  2001    September 50.8  -99.99
11084   77  14  2001    October   0     -99.99
54328   89  56  2001    November  -999  -99.99

I'm trying to add two new columns and fill 'Yes'/'No' values for the records with missing values. My expected output is:

ID      P1  P2  Year    Month     A      B      A_miss B_miss
11084   23  43  2001    April     41.9  -99.99  No     Yes
67985   76  12  2001    May       6.9    123    No     No
11084   34  64  2001    June      -999  -99.99  Yes    Yes
34084   56  77  2001    July      NA    -99.99  Yes    Yes
11043   90  54  2001    August    NA    -99.99  Yes    Yes
23084   55  32  2001    September 50.8  -99.99  No     Yes
11084   77  14  2001    October   0     -99.99  No     Yes
54328   89  56  2001    November  -999  -99.99  Yes    Yes

I'm new to R. I was trying to achieve this using simple for loop and if/else conditions in the following way:

for(i in length(df$A))
{
  if(df$A[i] == -999 || df$A[i] == 'NA')

    df$A_miss[i] <- 'Yes'

  else  
     df$A_miss[i] <- 'No'
}

I was firstly trying the loop on 'A' column, but only the else part was executing everytime I try and the 'No' values are being filled in the entire 'A_miss' column. I'm unable to find out why the if part isn't working.

Where am I going wrong?

Upvotes: 1

Views: 87

Answers (4)

Acarbalacar
Acarbalacar

Reputation: 734

Using the which command might increase the speed of the process:

df$A_miss[which(df$A==-999 | is.na(df$A))] <- 'Yes'
df$A_miss[which(df$A_miss!='Yes')] <- 'no'

Upvotes: 0

RHertel
RHertel

Reputation: 23788

Your loop is not correctly defined. This one works:

for (i in 1:length(df$A)) {
    if(df$A[i] == -999 || is.na(df$A[i]) )
        df$A_miss[i] <- 'Yes'

    else  
        df$A_miss[i] <- 'No'
}

The limit should be set as (i in 1:length(df$A)), and not as (i in length(df$A). Hope this helps.

PS: As you can see, the important correction pointed out by @Pascal has been implemented here.

PPS: The version below should be much faster than your code with the for loop:

df$A_miss <- 'No'
df$A_miss[which(df$A==-999 | is.na(df$A)] <- 'Yes'

(I just noticed that this solution is very similar to the one that had been suggested earlier by @Daniel Fischer)

Upvotes: 3

N8TRO
N8TRO

Reputation: 3364

A vectorized version:

df <- structure(list(ID = c(11084L, 67985L, 11084L, 34084L, 11043L, 
23084L, 11084L, 54328L), P1 = c(23L, 76L, 34L, 56L, 90L, 55L, 
77L, 89L), P2 = c(43L, 12L, 64L, 77L, 54L, 32L, 14L, 56L), Year = c(2001L, 
2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L), Month = structure(c(1L, 
5L, 4L, 3L, 2L, 8L, 7L, 6L), .Label = c("April", "August", "July", 
"June", "May", "November", "October", "September"), class = "factor"), 
A = c(41.9, 6.9, -999, NA, NA, 50.8, 0, -999), B = c(-99.99, 
123, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99), A_miss = c("No", 
"No", "Yes", "Yes", "Yes", "No", "No", "Yes")), .Names = c("ID", 
"P1", "P2", "Year", "Month", "A", "B", "A_miss"), row.names = c(NA, 
-8L), class = "data.frame")

df$A_miss <- ifelse(df$A == -999 | is.na(df$A), "yes", "no")
df$B_miss <- ifelse(df$B == -99.99 | is.na(df$B), "yes", "no")

     ID P1 P2 Year     Month      A      B A_miss B_miss
1 11084 23 43 2001     April   41.9 -99.99     no    yes
2 67985 76 12 2001       May    6.9 123.00     no     no
3 11084 34 64 2001      June -999.0 -99.99    yes    yes
4 34084 56 77 2001      July     NA -99.99    yes    yes
5 11043 90 54 2001    August     NA -99.99    yes    yes
6 23084 55 32 2001 September   50.8 -99.99     no    yes
7 11084 77 14 2001   October    0.0 -99.99     no    yes
8 54328 89 56 2001  November -999.0 -99.99    yes    yes

Upvotes: 2

Daniel Fischer
Daniel Fischer

Reputation: 3380

Maybe you could try this, without any loop or if clause:

    df$A[(df$A==-999)|(is.na(df$A))] <- "yes"
    df$A[df$A!="yes"] <- "no"

Upvotes: 0

Related Questions