stochastic13
stochastic13

Reputation: 423

Logical condition while subsetting not giving correct values

I wanted to subset data frame project I was working with, using a logical. I am getting a paradoxical result. The part of the logical preceding the ROLL.NO. argument is irrelevant to the question. Sorry, I could not give a reproducible example. Do let me know how can I make this question reproducible without having to show the entire 393 entries of the relevant columns in my data frame.D14 and DC31 are simple integer values, with some values being NA.

 culprits<-project$ROLL.NO.[(project$DC31==1&project$D14==2)|(project$DC31==2&project$D14==1)&!is.na(project$DC31)&!is.na(project$D14)]
culprits
 [1] 3138 3129 3129 3135 3135 3136 3120 3126 3133 3125 3125 3125 3132 3132 3123 3123 3131
 project$HOUSE.NO[(project$DC31==1&project$D14==2)|(project$DC31==2&project$D14==1)&!is.na(project$DC31)&!is.na(project$D14)&project$ROLL.NO.==3131]
[1] "14/132" "14/176" "16/133" "14/111" "14/252"
> project$HOUSE.NO[(project$DC31==1&project$D14==2)|(project$DC31==2&project$D14==1)&!is.na(project$DC31)&!is.na(project$D14)&project$ROLL.NO.==3129]
[1] "14/132" "15/162" "14/176" "16/133" "14/111"
> project$ROLL.NO.[(project$DC31==1&project$D14==2)|(project$DC31==2&project$D14==1)&!is.na(project$DC31)&!is.na(project$D14)&project$ROLL.NO.==3136]
[1] 3129 3136 3120 3123 3123
 project$ROLL.NO.[(project$DC31==1&project$D14==2)|(project$DC31==2&project$D14==1)&!is.na(project$DC31)&!is.na(project$D14)&project$ROLL.NO.==3125]
[1] 3129 3120 3125 3125 3125 3123 3123
project$ROLL.NO.[project$ROLL.NO.==3136]
[1] 3136 3136 3136 3136 3136 3136 3136 3136 3136

I tried to understand what was going wrong in my code and I have also included the results of those queries. When project$ROLL.NO.==3136 is FALSE for any other ROLL.NO., I fail to see why are other ROLL.NO. called when other arguments are added with an & with it. Moreover, the same three entries erroneously repeat along with any called ROLL.NO. There are no NA values in the ROLL.NO. column. And the length of the logical vectors in each of the conditions is the same, hence no recycling. Do let me know if additional information needs to be given.

ADDENDUM

project <-  structure(list(ROLL.NO. = c(3138L, 3138L, 3138L, 3138L, 3138L, 
3138L, 3138L, 3138L, 3138L, 3138L, 3138L, 3138L, 3138L, 3138L, 
3138L, 3138L, 3138L, 3138L, 3138L, 3138L, 3138L, 3129L, 3129L, 
3129L, 3129L, 3129L, 3129L, 3129L, 3129L, 3129L, 3129L, 3129L, 
3129L, 3129L, 3129L, 3129L, 3129L, 3129L, 3129L, 3129L, 3129L, 
3129L, 3129L, 3129L, 3121L, 3121L, 3121L, 3121L, 3121L, 3121L
), DC31 = c(2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 
1L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 
1L, 2L, 2L, 2L, 2L), D14 = c(2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 
1L, 2L, 1L, 2L, 0L, 1L, 2L, 2L, 0L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), HOUSE.NO = c("14/274", 
"14/259", "14/217", "14/258", "14/306", "14/300", "14/96", "14/166", 
"14/69", "14/68", "14/16", "14/93", "14/130", "14/321", "14/324", 
"14/139", "14/314", "14/323", "14/208", "14/78", "14/150", "14/155", 
"14/102", "14/132", "14/159", "14/163", "14/165", "14/146", "14/148", 
"14/104", "14/56", "14/53", "14/99", "14/48", "15/164", "15/148", 
"15/158", "15/107", "15/160", "15/162", "15/243", "15/66", "15/249", 
"15/86", "14/388", "14/396", "14/431", "14/401", "14/103", "15/36"
)), .Names = c("ROLL.NO.", "DC31", "D14", "HOUSE.NO"), row.names = c(NA, 
50L), class = "data.frame")

Upvotes: 0

Views: 44

Answers (1)

rawr
rawr

Reputation: 20811

From ?base::Logic, help('&'), help('|'), etc

See Syntax for the precedence of these operators: unlike many other languages (including S) the AND and OR operators do not have the same precedence (the AND operators have higher precedence than the OR operators).

which explains why

TRUE | TRUE & FALSE
# [1] TRUE

which is essentially

TRUE | (TRUE & FALSE)

which is also true, and a simplification of what you are doing here:

(project$DC31==1&project$D14==2) |
  (project$DC31==2&project$D14==1) &
  !is.na(project$DC31) &
  !is.na(project$D14) &
  project$ROLL.NO. == 3131

since you expect the result only to contain some project$ROLL.NO. == 3131 I assume, so even if some of these are false, if one or more OR is true, you may get some that are not ROLL.NO. which are not 3131

Also note that ! has a higher precedence than logicals

Upvotes: 2

Related Questions