William Liu
William Liu

Reputation: 339

How to select the first and last one test without NA in r

I have asked a similar question yesterday: How to select the last one test without NA in r

But I found the problem becomes difficult that there are NA exist in the first column.

My dataframe is similar like this:

Person  W.1   W.2   W.3   W.4   W.5   
1       NA    57    52    59    NA
2       49    NA    60    61    NA
3       34    79    NA    58    NA

Is there a way to select the first and last test without "NA". I have 300 data entries, and W.1 means the first test, W.2 means the second test, W.n means the nth test. I want to compare the score of the first test with the score of the last test, I also want to compare the score of the first test and the highest. The desired output should be like:

1    57 59
2    49 61
3    34 58

and another should be like:

1    52 59
2    49 61
3    34 79

But different persons have different places having "NA", can someone help me?

Thank you!

Upvotes: 0

Views: 86

Answers (3)

David Arenburg
David Arenburg

Reputation: 92282

Here's a somewhat vectorized approach with base R

start <- cbind(seq_len(nrow(df)), max.col(!is.na(df[-1L]), ties.method = "first") + 1L)
end <- cbind(seq_len(nrow(df)), max.col(!is.na(df[-1L]), ties.method = "last") + 1L)
maxval <- do.call(pmax, c(df[-1L], na.rm = TRUE))
cbind(df[1L], start = df[start], end = df[end], maxvalue = maxval)
#   Person start end maxvalue
# 1      1    57  59       59
# 2      2    49  61       61
# 3      3    34  58       79

Or you can just modify @Marats solution from your previous question as in

t(apply(df[-1], 1, function(x) c(x[range(which(!is.na(x)))], max(x, na.rm = TRUE))))
#      [,1] [,2] [,3]
# [1,]   57   59   59
# [2,]   49   61   61
# [3,]   34   58   79

Upvotes: 2

Arun
Arun

Reputation: 118779

Here's using data.table:

require(data.table)
require(reshape2)
melt(dt, id=1L, na.rm=TRUE)[, .(first=value[1L], 
         last=value[.N], max=max(value)), keyby=Person]
#    Person first last max
# 1:      1    57   59  59
# 2:      2    49   61  61
# 3:      3    34   58  79

You can safely ignore the warning. reshape2 is necessary if you use data.table version <= 1.9.4.

Upvotes: 2

Gregor Thomas
Gregor Thomas

Reputation: 145755

I'd convert the data to long form and then use dplyr:

library(tidyr)
library(dplyr)
dat.long = gather(dat, key = test, value = score, W.1:W.5)
head(dat.long)
#  Person test score
# 1      1  W.1    NA
# 2      2  W.1    49
# 3      3  W.1    34
# 4      1  W.2    57
# 5      2  W.2    NA
# 6      3  W.2    79

dat.long %>% na.omit %>%
    mutate(test = grep(pattern = "[0-9]", x = test)) %>%
    group_by(Person) %>%
    summarize(first = score[which.min(test)],
              last = score[which.max(test)],
              max = max(score))

#    Person first last max
#  1      1    57   59  59
#  2      2    49   61  61
#  3      3    34   58  79

Combining both your desired outputs.

Upvotes: 1

Related Questions