Reputation: 1
I'm teaching myself R and I think this code counts the number of times a survey has a value (not NA
) for all 4 of these variables in ()
?
Can someone confirm or correct me? Thanks for helping out a nervous newbie. I need this number for a denominator (surveys without missing data). Thanks!
sum(!is.na(Both_hwstations) &
!is.na(Both_latrines) &
!is.na(rapid_unique$hf_ipcfocal) &
!is.na(rapid_unique$water_avail)
)
Upvotes: 0
Views: 41
Reputation: 6485
First of all, I think the answer to your question
this code counts the number of times a survey has a value (not NA) for all 4 of these variables in ()?
is yes... or rather yes, but..
Just to illustrate what everyone is commenting on:
This is a simpliefied version of the problem:
varA <- c(7:10)
varB <- c(1:3, NA)
df <- data.frame(v1 = 1:4,
v2 = 11:14)
All, varA
, varB
and df
have length of 4 or 4 rows respectively.
varA
[1] 7 8 9 10
varB
[1] 1 2 3 NA
df
v1 v2
1 1 11
2 2 12
3 3 13
4 4 14
Your sum code
sum(!is.na(varA) &
!is.na(varB) &
!is.na(df$v1) &
!is.na(df$v2))
Returns:
[1] 3
Because it uses TRUE
as 1
and FALSE
as 0
when trying to do math with booleans. So far so good...
But if we change the vectors to
varA <- c(NA, 0)
varB <- c(1)
varA
[1] NA 0
varB
[1] 1
What would the expected result of the sum code be in this case? One row with not NA
? The result of the sum code depends now on the somewhat idiosyncratic way R handles this situation where vectors have different lengths:
sum(!is.na(varA) &
!is.na(varB) &
!is.na(df$v1) &
!is.na(df$v2))
Returns:
[1] 2
Because R (in the background) doubled varA
to make it the same length as df$v1
. And varB
was repeated four times for that matter...
For that and other reasons its often safer to explicitly bind the columns together first, and then use something like complete.cases
to test for this things. That way we can be sure(er) that we don't accidentally mistake artefacts of the programming language as data ;)
We could instead do:
df$varA <- varA
df$varB <- varB
df
v1 v2 varA varB
1 1 11 7 1
2 2 12 8 2
3 3 13 9 3
4 4 14 10 NA
And use base R
s complete.cases
:
df[complete.cases(df), ]
v1 v2 varA varB
1 1 11 7 1
2 2 12 8 2
3 3 13 9 3
Upvotes: 1