Reputation: 339
I have asked a similar question yesterday: How to select the last one test without NA in r
But I found the problem becomes difficult that there are NA exist in the first column.
My dataframe is similar like this:
Person W.1 W.2 W.3 W.4 W.5
1 NA 57 52 59 NA
2 49 NA 60 61 NA
3 34 79 NA 58 NA
Is there a way to select the first and last test without "NA". I have 300 data entries, and W.1 means the first test, W.2 means the second test, W.n means the nth test. I want to compare the score of the first test with the score of the last test, I also want to compare the score of the first test and the highest. The desired output should be like:
1 57 59
2 49 61
3 34 58
and another should be like:
1 52 59
2 49 61
3 34 79
But different persons have different places having "NA", can someone help me?
Thank you!
Upvotes: 0
Views: 86
Reputation: 92282
Here's a somewhat vectorized approach with base R
start <- cbind(seq_len(nrow(df)), max.col(!is.na(df[-1L]), ties.method = "first") + 1L)
end <- cbind(seq_len(nrow(df)), max.col(!is.na(df[-1L]), ties.method = "last") + 1L)
maxval <- do.call(pmax, c(df[-1L], na.rm = TRUE))
cbind(df[1L], start = df[start], end = df[end], maxvalue = maxval)
# Person start end maxvalue
# 1 1 57 59 59
# 2 2 49 61 61
# 3 3 34 58 79
Or you can just modify @Marats solution from your previous question as in
t(apply(df[-1], 1, function(x) c(x[range(which(!is.na(x)))], max(x, na.rm = TRUE))))
# [,1] [,2] [,3]
# [1,] 57 59 59
# [2,] 49 61 61
# [3,] 34 58 79
Upvotes: 2
Reputation: 118779
Here's using data.table
:
require(data.table)
require(reshape2)
melt(dt, id=1L, na.rm=TRUE)[, .(first=value[1L],
last=value[.N], max=max(value)), keyby=Person]
# Person first last max
# 1: 1 57 59 59
# 2: 2 49 61 61
# 3: 3 34 58 79
You can safely ignore the warning. reshape2
is necessary if you use data.table version <= 1.9.4.
Upvotes: 2
Reputation: 145755
I'd convert the data to long form and then use dplyr
:
library(tidyr)
library(dplyr)
dat.long = gather(dat, key = test, value = score, W.1:W.5)
head(dat.long)
# Person test score
# 1 1 W.1 NA
# 2 2 W.1 49
# 3 3 W.1 34
# 4 1 W.2 57
# 5 2 W.2 NA
# 6 3 W.2 79
dat.long %>% na.omit %>%
mutate(test = grep(pattern = "[0-9]", x = test)) %>%
group_by(Person) %>%
summarize(first = score[which.min(test)],
last = score[which.max(test)],
max = max(score))
# Person first last max
# 1 1 57 59 59
# 2 2 49 61 61
# 3 3 34 58 79
Combining both your desired outputs.
Upvotes: 1