Reputation: 369

how to select specific part of the grouped data in R?

After grouping the data by ID, I want to get the ones that contain at least 2 values after F2 in the Visit column.

Let me show it in the example below. "dt" is my input and "ot" is my output. I create "ot", because ID 102 has only F3 after F2, aka this ID doesnot have at least 2 visits after F2. If it had visits F3, F4, I'd select ID 102 as well.

Thanks for your help.

dt<-data.frame(ID=c(100,100,100,100,100,100,100,
                    101,101,101,101,101,101,
                    102,102,102,102),
               Visit=c("Bsl", "F1", "F2", "F3", "F4", "F5", "F6",
                       "Bsl", "F1", "F2", "F3", "F4", "F5",
                       "Bsl", "F1", "F2", "F3"),
               X1=c(21,22,28,26,33,34,37, 
                    44,43,48,50,52,57,
                    69,67,66, 70))

ot<-data.frame(ID=c(100,100,100,100,100,100,100,
                    101,101,101,101,101,101),
               Visit=c("Bsl", "F1", "F2", "F3", "F4", "F5", "F6",
                       "Bsl", "F1", "F2", "F3", "F4", "F5"),
               X1=c(21,22,28,26,33,34,37, 
                    44,43,48,50,52,57))

Upvotes: 2

Answers (4)

Anoushiravan R

Reputation: 21918

You can also use the following solution. This time we we row_number instead of cumsum:

library(dplyr)

dt %>%
  group_by(ID) %>%
  mutate(id = row_number()) %>%
  filter(last(id) - id[Visit == "F2"] >= 2) %>%
  select(-id)

# A tibble: 13 x 3
# Groups:   ID [2]
      ID Visit    X1
   <dbl> <chr> <dbl>
 1   100 Bsl      21
 2   100 F1       22
 3   100 F2       28
 4   100 F3       26
 5   100 F4       33
 6   100 F5       34
 7   100 F6       37
 8   101 Bsl      44
 9   101 F1       43
10   101 F2       48
11   101 F3       50
12   101 F4       52
13   101 F5       57

Upvotes: 0

AnilGoyal

Reputation: 26218

Proceed like this

Visit == 'F2 returns True in the row.
cumsum over it will return at least 1 or greater number (if there are multiple F2 in that group) in all rows after that row.
Sum will return final total sum (scalar) of number of rows including and after that specific row, for that group.
Now we will have to check only those groups where this sum is greater than 2 (i.e. at least 2 rows after first F1)

library(dplyr)

dt %>% group_by(ID) %>%
  filter(sum(cumsum(Visit == 'F2')) > 2)

#> # A tibble: 13 x 3
#> # Groups:   ID [2]
#>       ID Visit    X1
#>    <dbl> <chr> <dbl>
#>  1   100 Bsl      21
#>  2   100 F1       22
#>  3   100 F2       28
#>  4   100 F3       26
#>  5   100 F4       33
#>  6   100 F5       34
#>  7   100 F6       37
#>  8   101 Bsl      44
#>  9   101 F1       43
#> 10   101 F2       48
#> 11   101 F3       50
#> 12   101 F4       52
#> 13   101 F5       57

^{Created on 2021-05-31 by the reprex package (v2.0.0)}

Upvotes: 3

Yuriy Saraykin

Reputation: 8880

data.table

library(data.table)
dt<-data.frame(ID=c(100,100,100,100,100,100,100,
                    101,101,101,101,101,101,
                    102,102,102,102),
               Visit=c("Bsl", "F1", "F2", "F3", "F4", "F5", "F6",
                       "Bsl", "F1", "F2", "F3", "F4", "F5",
                       "Bsl", "F1", "F2", "F3"),
               X1=c(21,22,28,26,33,34,37, 
                    44,43,48,50,52,57,
                    69,67,66, 70))

setDT(dt)[, .SD[sum(cumsum(Visit == "F2")) - 1 > 1], by = ID]
#>      ID Visit X1
#>  1: 100   Bsl 21
#>  2: 100    F1 22
#>  3: 100    F2 28
#>  4: 100    F3 26
#>  5: 100    F4 33
#>  6: 100    F5 34
#>  7: 100    F6 37
#>  8: 101   Bsl 44
#>  9: 101    F1 43
#> 10: 101    F2 48
#> 11: 101    F3 50
#> 12: 101    F4 52
#> 13: 101    F5 57

^{Created on 2021-05-31 by the reprex package (v2.0.0)}

Upvotes: 0

Ronak Shah

Reputation: 388982

You can use match to get the index where 'F2' occurs add + 2 to that number and select an ID if that is less than number of rows in the group.

library(dplyr)

dt %>%
  group_by(ID) %>%
  filter(match('F2', Visit) + 2 < n()) %>%
  ungroup

#      ID Visit    X1
#   <dbl> <chr> <dbl>
# 1   100 Bsl      21
# 2   100 F1       22
# 3   100 F2       28
# 4   100 F3       26
# 5   100 F4       33
# 6   100 F5       34
# 7   100 F6       37
# 8   101 Bsl      44
# 9   101 F1       43
#10   101 F2       48
#11   101 F3       50
#12   101 F4       52
#13   101 F5       57

Upvotes: 1

how to select specific part of the grouped data in R?

Answers (4)

Related Questions