Reputation: 339
My dataframe is similar like this:
Person W.1 W.2 W.3 W.4 W.5
1 62 57 52 59 NA
2 49 38 60 NA NA
3 59 34 NA NA NA
Is there a way to select the first and last test without "NA". I have 300 data entries, and W.1 means the first test, W.2 means the second test, W.n means the nth test. I want to compare the score of the first test with the score of the last test. For example, I want to compare:
1 62 59
2 49 60
3 59 34
But different persons have different places having "NA", can someone help me?
Thank you!
Upvotes: 2
Views: 136
Reputation: 92282
Here's a possible vectorized solution using max.col
(I'm assuming that the first test is never NA
, though it can be easily fixed if otherwise)
indx <- cbind(seq_len(nrow(df)), max.col(!is.na(df), ties.method = "last"))
cbind(df[, 2], df[indx])
# [,1] [,2]
# [1,] 62 59
# [2,] 49 60
# [3,] 59 34
Another similar solution is to use rowSums
cbind(df[, 2], df[cbind(seq_len(nrow(df)), rowSums(!is.na(df)))])
# [,1] [,2]
# [1,] 62 59
# [2,] 49 60
# [3,] 59 34
Upvotes: 4
Reputation: 13304
You can use this solution:
> t(apply(d[-1],1,function(rw) rw[range(which(!is.na(rw)))]))
[,1] [,2]
[1,] 62 59
[2,] 49 60
[3,] 59 34
where d
is your data set.
How it works: for each row of d
(rows are scanned using apply(d[-1],1,...)
, where d[-1]
excludes the first column), get the indices of non-NA test results (which(!is.na(rw))
), then get the lowest and highest value of indices by using range()
, and obtain the test scores that correspond to those indices (rw[...]
). The final result is transposed using t()
.
Note that this solution will work properly even in the case of NAs in the middle of the test scores, e.g. c(NA, 57, NA, 52, NA)
.
Upvotes: 3