Melih Aras
Melih Aras

Reputation: 369

how to select specific part of the grouped data in R?

After grouping the data by ID, I want to get the ones that contain at least 2 values after F2 in the Visit column.

Let me show it in the example below. "dt" is my input and "ot" is my output. I create "ot", because ID 102 has only F3 after F2, aka this ID doesnot have at least 2 visits after F2. If it had visits F3, F4, I'd select ID 102 as well.

Thanks for your help.

dt<-data.frame(ID=c(100,100,100,100,100,100,100,
                    101,101,101,101,101,101,
                    102,102,102,102),
               Visit=c("Bsl", "F1", "F2", "F3", "F4", "F5", "F6",
                       "Bsl", "F1", "F2", "F3", "F4", "F5",
                       "Bsl", "F1", "F2", "F3"),
               X1=c(21,22,28,26,33,34,37, 
                    44,43,48,50,52,57,
                    69,67,66, 70))

ot<-data.frame(ID=c(100,100,100,100,100,100,100,
                    101,101,101,101,101,101),
               Visit=c("Bsl", "F1", "F2", "F3", "F4", "F5", "F6",
                       "Bsl", "F1", "F2", "F3", "F4", "F5"),
               X1=c(21,22,28,26,33,34,37, 
                    44,43,48,50,52,57))

Upvotes: 2

Views: 139

Answers (4)

Anoushiravan R
Anoushiravan R

Reputation: 21918

You can also use the following solution. This time we we row_number instead of cumsum:

library(dplyr)

dt %>%
  group_by(ID) %>%
  mutate(id = row_number()) %>%
  filter(last(id) - id[Visit == "F2"] >= 2) %>%
  select(-id)

# A tibble: 13 x 3
# Groups:   ID [2]
      ID Visit    X1
   <dbl> <chr> <dbl>
 1   100 Bsl      21
 2   100 F1       22
 3   100 F2       28
 4   100 F3       26
 5   100 F4       33
 6   100 F5       34
 7   100 F6       37
 8   101 Bsl      44
 9   101 F1       43
10   101 F2       48
11   101 F3       50
12   101 F4       52
13   101 F5       57

Upvotes: 0

AnilGoyal
AnilGoyal

Reputation: 26218

Proceed like this

  • Visit == 'F2 returns True in the row.
  • cumsum over it will return at least 1 or greater number (if there are multiple F2 in that group) in all rows after that row.
  • Sum will return final total sum (scalar) of number of rows including and after that specific row, for that group.
  • Now we will have to check only those groups where this sum is greater than 2 (i.e. at least 2 rows after first F1)
library(dplyr)

dt %>% group_by(ID) %>%
  filter(sum(cumsum(Visit == 'F2')) > 2)

#> # A tibble: 13 x 3
#> # Groups:   ID [2]
#>       ID Visit    X1
#>    <dbl> <chr> <dbl>
#>  1   100 Bsl      21
#>  2   100 F1       22
#>  3   100 F2       28
#>  4   100 F3       26
#>  5   100 F4       33
#>  6   100 F5       34
#>  7   100 F6       37
#>  8   101 Bsl      44
#>  9   101 F1       43
#> 10   101 F2       48
#> 11   101 F3       50
#> 12   101 F4       52
#> 13   101 F5       57

Created on 2021-05-31 by the reprex package (v2.0.0)

Upvotes: 3

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

data.table

library(data.table)
dt<-data.frame(ID=c(100,100,100,100,100,100,100,
                    101,101,101,101,101,101,
                    102,102,102,102),
               Visit=c("Bsl", "F1", "F2", "F3", "F4", "F5", "F6",
                       "Bsl", "F1", "F2", "F3", "F4", "F5",
                       "Bsl", "F1", "F2", "F3"),
               X1=c(21,22,28,26,33,34,37, 
                    44,43,48,50,52,57,
                    69,67,66, 70))

setDT(dt)[, .SD[sum(cumsum(Visit == "F2")) - 1 > 1], by = ID]
#>      ID Visit X1
#>  1: 100   Bsl 21
#>  2: 100    F1 22
#>  3: 100    F2 28
#>  4: 100    F3 26
#>  5: 100    F4 33
#>  6: 100    F5 34
#>  7: 100    F6 37
#>  8: 101   Bsl 44
#>  9: 101    F1 43
#> 10: 101    F2 48
#> 11: 101    F3 50
#> 12: 101    F4 52
#> 13: 101    F5 57

Created on 2021-05-31 by the reprex package (v2.0.0)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

You can use match to get the index where 'F2' occurs add + 2 to that number and select an ID if that is less than number of rows in the group.

library(dplyr)

dt %>%
  group_by(ID) %>%
  filter(match('F2', Visit) + 2 < n()) %>%
  ungroup

#      ID Visit    X1
#   <dbl> <chr> <dbl>
# 1   100 Bsl      21
# 2   100 F1       22
# 3   100 F2       28
# 4   100 F3       26
# 5   100 F4       33
# 6   100 F5       34
# 7   100 F6       37
# 8   101 Bsl      44
# 9   101 F1       43
#10   101 F2       48
#11   101 F3       50
#12   101 F4       52
#13   101 F5       57

Upvotes: 1

Related Questions