juanli
juanli

Reputation: 613

Select the row conditioning on the first occurence of a fixed value using R

Here is my repeated measurements dataframe

subject  StartTime_month  StopTime_month  ...

1          0.0            0.5
1          0.5            1.0  
1          1.0            3.0
1          3.0            6.0
1          6.0            9.6
1          9.6            12.1
2          0.0            0.5
2          0.5            1.0 
2          1.0            1.9
2          1.9            3.2
2          3.2            6.2
2          6.2            8.2

I would like to select the rows which have the first StopTime_month >6.0 for each subject

Upvotes: 1

Views: 58

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388907

With base R aggregate

aggregate(.~subject, df[df$StopTime_month > 6, ], function(x) x[1])

#  subject StartTime_month StopTime_month
#1       1             6.0            9.6
#2       2             3.2            6.2

Upvotes: 1

maRtin
maRtin

Reputation: 6516

A base R solution:

For subject 1:

df[df$subject==1 & df$StopTime_month > 6,][1,]

For subject 2:

df[df$subject==2 & df$StopTime_month > 6,][1,]

(where df is your dataframe)

Upvotes: 0

akrun
akrun

Reputation: 887048

We can try with data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'subject', get the row index of the first instance where 'StopTime_month' is greater than 6, and use that to subset the rows

library(data.table)
setDT(df1)[df1[,  .I[which(StopTime_month > 6)[1]], by = subject]$V1]
#   subject StartTime_month StopTime_month
#1:       1             6.0            9.6
#2:       2             3.2            6.2

Supppose, if we need all the rows until the first instance of 'StopTime_month' greater than 6,

setDT(df1)[, .SD[cumsum(StopTime_month > 6)<2], by = subject]
#     subject StartTime_month StopTime_month
# 1:       1             0.0            0.5
# 2:       1             0.5            1.0
# 3:       1             1.0            3.0
# 4:       1             3.0            6.0
# 5:       1             6.0            9.6
# 6:       2             0.0            0.5
# 7:       2             0.5            1.0
# 8:       2             1.0            1.9
# 9:       2             1.9            3.2
#10:       2             3.2            6.2

Or using dplyr

library(dplyr)
df1 %>% 
   filter(StopTime_month > 6) %>%
   group_by(subject) %>% 
   slice(1L)
#   subject StartTime_month StopTime_month
#     <int>           <dbl>          <dbl>
#1       1             6.0            9.6
#2       2             3.2            6.2

Upvotes: 3

Related Questions