Reputation: 129

Select last two rows of each group with certain value on a variable in R

If possible, I would like to select the last two rows of each group (ID) that have a valid value (i.e., not NA) on my outcome variable (outcome).

Sample data looks like this:

df <- read.table(text="
                      ID       outcome
                 1    800033   3
                 2    800033   3
                 3    800033   NA   
                 4    800033   2  
                 5    800033   1  
                 15   800076   2
                 16   800076   NA
                 17   800100   4     
                 18   800100   4  
                 19   800100   4  
                 20   800100   3   
                 30   800125   2   
                 31   800125   1   
                 32   800125   NA", header=TRUE)

In the case that a participant does not have two valid values on my outcome variable (e.g., ID == 800076), I would still like to keep the last two rows of this group (ID). All other rows should be deleted.

My final data set would therefore look like this:

     ID       outcome
4    800033   2  
5    800033   1  
15   800076   2
16   800076   NA
19   800100   4  
20   800100   3   
30   800125   2   
31   800125   1

Any advices on how to do this are highly appreciated!

Upvotes: 0

Answers (2)

akrun

Reputation: 887901

We can do this with dplyr

library(dplyr)
df %>% 
   group_by(ID) %>% 
   filter(n() <=2 | !is.na(outcome) ) %>%
   slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups:   ID [4]
#      ID outcome
#   <int>   <int>
#1 800033       2
#2 800033       1
#3 800076       2
#4 800076      NA
#5 800100       4
#6 800100       3
#7 800125       2
#8 800125       1

Upvotes: 0

Ronak Shah

Reputation: 389275

We can have an if condition for slice and check if number of rows is greater than 2 and select the rows based on that condition.

library(dplyr)
df %>%
  group_by(ID) %>%
  slice(if (n() > 2) tail(which(!is.na(outcome)), 2) else 1:n())

#      ID outcome
#   <int>   <int>
#1 800033       2
#2 800033       1
#3 800076       2
#4 800076      NA
#5 800100       4
#6 800100       3
#7 800125       2
#8 800125       1

Upvotes: 1

Select last two rows of each group with certain value on a variable in R

Answers (2)

Related Questions