Reputation: 6685
I need to identify leavers in a survey dataset. For this, I would like to add another column to my data which counts the consecutive NA
, beginning with one specific column and then counting backwards.
I already counted the overall NA
as explained here, and although a high count of NA
is a pretty good indicator, I'd like to make sure people didn't just skip through parts of the questionnaire instead of outright leaving.
Here's some example data:
df <- structure(list(f1 = c(3, 3, 1, 2, 3, 2, 2, NA, 2, 3), f2num = c(170,
NA, 182, 173, 169, NA, NA, NA, 153, 178), f3num = c(105, NA,
77, 80, 58, NA, NA, NA, 45, 81), f4num = c(2, NA, 0, NA, NA,
NA, 1, NA, 0, 0), f5num = c(9, NA, 1, NA, NA, NA, 2, NA, 0, 2
), f6num = c(NA, NA, NA, NA, NA, NA, 0, NA, NA, NA), f7 = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), f7num = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), f8num = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), f9 = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_)), .Names = c("f1", "f2num", "f3num", "f4num",
"f5num", "f6num", "f7", "f7num", "f8num", "f9"), row.names = c(NA,
10L), class = "data.frame")
> df
f1 f2num f3num f4num f5num f6num f7 f7num f8num f9
1 3 170 105 2 9 NA NA NA NA NA
2 3 NA NA NA NA NA NA NA NA NA
3 1 182 77 0 1 NA NA NA NA NA
4 2 173 80 NA NA NA NA NA NA NA
5 3 169 58 NA NA NA NA NA NA NA
6 2 NA NA NA NA NA NA NA NA NA
7 2 NA NA 1 2 0 NA NA NA NA
8 NA NA NA NA NA NA NA NA NA NA
9 2 153 45 0 0 NA NA NA NA NA
10 3 178 81 0 2 NA NA NA NA NA
My expected output should look like this:
> df
f1 f2num f3num f4num f5num f6num f7 f7num f8num f9 consNA
1 3 170 105 2 9 NA NA NA NA NA 5
2 3 NA NA NA NA NA NA NA NA NA 9
3 1 182 77 0 1 NA NA NA NA NA 5
4 2 173 80 NA NA NA NA NA NA NA 7
5 3 169 58 NA NA NA NA NA NA NA 7
6 2 NA NA NA NA NA NA NA NA NA 9
7 2 NA NA 1 2 0 NA NA NA NA 4
8 NA NA NA NA NA NA NA NA NA NA 10
9 2 153 45 0 0 NA NA NA NA NA 5
10 3 178 81 0 2 NA NA NA NA NA 5
Jthorpe's answer to this question got me as far as
t(apply(df,1,function(x)which.min(rev(is.na(x)))-1))
1 2 3 4 5 6 7 8 9 10
[1,] 5 9 5 7 7 9 4 0 5 5
which is obviously almost what I need, but it does not work if everything is NA
(see row 8).
Upvotes: 1
Views: 1032
Reputation: 389065
This is a bit clumsy but it works :
df$consNA <- apply(df, 1, function(x) sum(cumsum(!is.na(rev(x))) == 0))
df$consNA
#[1] 5 9 5 7 7 9 4 10 5 5
For every row, we reverse its order and count the first set of NA
s until any non-NA is encountered.
Upvotes: 2