Ram
Ram

Reputation: 69

R, how to group by row value? Split?

I have a dataframe with a vector in the follwing format I want to keep all rows, within the "START" and "STOP" values in the column val (including them), and remove rows that is not inside start-stop

id = str_c("x",1:20)
val = c(rep("NO1", 3), "START", rep("yes1", 3), "STOP", 
    "NO2", "START", rep("yes2", 3), "STOP", rep("NO3",6))
data = data.frame(id,val)

My expected output is

data = data.frame(id = c(str_c("x", 4:8), str_c("x", 10:14)),
              val = c("START", rep("yes1", 3), "STOP",
                      "START", rep("yes2", 3), "STOP"))

Upvotes: 1

Views: 214

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388797

You can get the index of 'START' and 'STOP' values and create a sequence between them with mapply and select those rows from data.

ind1 <- which(data$val == 'START')
ind2 <- which(data$val == 'STOP')

data[sort(unique(unlist(mapply(`:`, ind1, ind2)))), ]

#    id   val
#4   x4 START
#5   x5  yes1
#6   x6  yes1
#7   x7  yes1
#8   x8  STOP
#10 x10 START
#11 x11  yes2
#12 x12  yes2
#13 x13  yes2
#14 x14  STOP

Upvotes: 1

Karthik S
Karthik S

Reputation: 11584

Does this work:

> library(dplyr)
> data %>% mutate(ID = substr(val, nchar(val), nchar(val))) %>% 
+   mutate(ID = case_when(ID == 'T' ~ lead(ID), ID == 'P' ~ lag(ID), TRUE ~ ID)) %>% 
+   group_by(ID) %>% filter(all(c('START','STOP') %in% val)) %>% 
+   filter(row_number() %in% (which(val == 'START'): which(val == 'STOP'))) %>% ungroup() %>% select(-3)
# A tibble: 10 x 2
   id    val  
   <chr> <chr>
 1 x4    START
 2 x5    yes1 
 3 x6    yes1 
 4 x7    yes1 
 5 x8    STOP 
 6 x10   START
 7 x11   yes2 
 8 x12   yes2 
 9 x13   yes2 
10 x14   STOP 

Upvotes: 1

at80
at80

Reputation: 780

Depending on the size of your data you could just do a brute force solution using a true false switch. You can return the row id's instead of the dataframe subset in the return output if you want.

parse_dataframe <- function(df, colnm='val'){
  idx = c()
  start <- FALSE
  
  for (i in 1:nrow(data)){
    if (df[i, colnm] == 'START') start <- TRUE
    if (start) idx <- c(idx, i)
    if (df[i, colnm] == 'STOP') start <- FALSE
  }
  return(df[idx, ])
}

parse_dataframe(data)
#    id   val
#4   x4 START
#5   x5  yes1
#6   x6  yes1
#7   x7  yes1
#8   x8  STOP
#10 x10 START
#11 x11  yes2
#12 x12  yes2
#13 x13  yes2
#14 x14  STOP

Upvotes: 1

Related Questions