Reputation: 369
After grouping the data by ID, I want to get the ones that contain at least 2 values after F2 in the Visit column.
Let me show it in the example below. "dt" is my input and "ot" is my output. I create "ot", because ID 102 has only F3 after F2, aka this ID doesnot have at least 2 visits after F2. If it had visits F3, F4, I'd select ID 102 as well.
Thanks for your help.
dt<-data.frame(ID=c(100,100,100,100,100,100,100,
101,101,101,101,101,101,
102,102,102,102),
Visit=c("Bsl", "F1", "F2", "F3", "F4", "F5", "F6",
"Bsl", "F1", "F2", "F3", "F4", "F5",
"Bsl", "F1", "F2", "F3"),
X1=c(21,22,28,26,33,34,37,
44,43,48,50,52,57,
69,67,66, 70))
ot<-data.frame(ID=c(100,100,100,100,100,100,100,
101,101,101,101,101,101),
Visit=c("Bsl", "F1", "F2", "F3", "F4", "F5", "F6",
"Bsl", "F1", "F2", "F3", "F4", "F5"),
X1=c(21,22,28,26,33,34,37,
44,43,48,50,52,57))
Upvotes: 2
Views: 139
Reputation: 21918
You can also use the following solution. This time we we row_number
instead of cumsum
:
library(dplyr)
dt %>%
group_by(ID) %>%
mutate(id = row_number()) %>%
filter(last(id) - id[Visit == "F2"] >= 2) %>%
select(-id)
# A tibble: 13 x 3
# Groups: ID [2]
ID Visit X1
<dbl> <chr> <dbl>
1 100 Bsl 21
2 100 F1 22
3 100 F2 28
4 100 F3 26
5 100 F4 33
6 100 F5 34
7 100 F6 37
8 101 Bsl 44
9 101 F1 43
10 101 F2 48
11 101 F3 50
12 101 F4 52
13 101 F5 57
Upvotes: 0
Reputation: 26218
Proceed like this
Visit == 'F2
returns True in the row.cumsum
over it will return at least 1
or greater number (if there are multiple F2 in that group) in all rows after that row.F1
)library(dplyr)
dt %>% group_by(ID) %>%
filter(sum(cumsum(Visit == 'F2')) > 2)
#> # A tibble: 13 x 3
#> # Groups: ID [2]
#> ID Visit X1
#> <dbl> <chr> <dbl>
#> 1 100 Bsl 21
#> 2 100 F1 22
#> 3 100 F2 28
#> 4 100 F3 26
#> 5 100 F4 33
#> 6 100 F5 34
#> 7 100 F6 37
#> 8 101 Bsl 44
#> 9 101 F1 43
#> 10 101 F2 48
#> 11 101 F3 50
#> 12 101 F4 52
#> 13 101 F5 57
Created on 2021-05-31 by the reprex package (v2.0.0)
Upvotes: 3
Reputation: 8880
data.table
library(data.table)
dt<-data.frame(ID=c(100,100,100,100,100,100,100,
101,101,101,101,101,101,
102,102,102,102),
Visit=c("Bsl", "F1", "F2", "F3", "F4", "F5", "F6",
"Bsl", "F1", "F2", "F3", "F4", "F5",
"Bsl", "F1", "F2", "F3"),
X1=c(21,22,28,26,33,34,37,
44,43,48,50,52,57,
69,67,66, 70))
setDT(dt)[, .SD[sum(cumsum(Visit == "F2")) - 1 > 1], by = ID]
#> ID Visit X1
#> 1: 100 Bsl 21
#> 2: 100 F1 22
#> 3: 100 F2 28
#> 4: 100 F3 26
#> 5: 100 F4 33
#> 6: 100 F5 34
#> 7: 100 F6 37
#> 8: 101 Bsl 44
#> 9: 101 F1 43
#> 10: 101 F2 48
#> 11: 101 F3 50
#> 12: 101 F4 52
#> 13: 101 F5 57
Created on 2021-05-31 by the reprex package (v2.0.0)
Upvotes: 0
Reputation: 388982
You can use match
to get the index where 'F2'
occurs add + 2 to that number and select an ID
if that is less than number of rows in the group.
library(dplyr)
dt %>%
group_by(ID) %>%
filter(match('F2', Visit) + 2 < n()) %>%
ungroup
# ID Visit X1
# <dbl> <chr> <dbl>
# 1 100 Bsl 21
# 2 100 F1 22
# 3 100 F2 28
# 4 100 F3 26
# 5 100 F4 33
# 6 100 F5 34
# 7 100 F6 37
# 8 101 Bsl 44
# 9 101 F1 43
#10 101 F2 48
#11 101 F3 50
#12 101 F4 52
#13 101 F5 57
Upvotes: 1