dgssd
dgssd

Reputation: 53

advanced row deleting in R

I am looking to do row deleting in R based on advanced selection logic (i.e. not just a simple subset). Here is some sample code and what I need to do

v1 <- c(1:11)
v2 <- c('a','a','b','b','b','b','c','c','c','c','c')
v3 <- c(3,13,14,13,14,9,14,13,14,13,14)
v4 <- c('','x','','','','x','','','','','x')
v5 <- c('','x','','y','','x','','y','','y','x')

test.df <- data.frame(v1,v2,v3,v4,v5)
names(test.df) <- c('id','level','number','end_flag','logic_flag')

What I want to do is remove all the rows for each specific level underneath where the first logic flag is equal to 'y'.

So in this case, the end result should remove no rows for level a, rows 5 and 6 for level b, and rows 9,10,11 for level c.

Basically, want to make the first '13' that comes up in the number column for each level the end_flag equal to 'x' and then delete all the rows for that level underneath the end_flag = 'x' Let me know if this makes sense as I need to clean this part up before proceeding with the rest of my code!

Thanks!

Upvotes: 2

Views: 168

Answers (2)

thelatemail
thelatemail

Reputation: 93938

Base R using cumsum twice:

posty <- function(x) cumsum(cumsum(x))<=1
test.df[with(test.df, ave(logic_flag=="y", level, FUN=posty)),]

#  id level number end_flag logic_flag
#1  1     a      3                    
#2  2     a     13        x          x
#3  3     b     14                    
#4  4     b     13                   y
#7  7     c     14                    
#8  8     c     13                   

Upvotes: 3

Rorschach
Rorschach

Reputation: 32466

Using dplyr you can do

library(dplyr)
test.df %>% group_by(level) %>%
  filter(head(cumsum(c(F, logic_flag == 'y')) == 0, -1))
#   id level number end_flag logic_flag
# 1  1     a      3                    
# 2  2     a     13        x          x
# 3  3     b     14                    
# 4  4     b     13                   y
# 5  7     c     14                    
# 6  8     c     13                   y

First, group by level, then remove rows where we have already seen a "y" (using cumsum). An F is appended to the cumsum vector because we want to include the first row that contains a "y". Since the length of the vector was increased by 1, head(..., -1) is used to drop the last element. I think dplyr has some lag functions that could do a similar thing as well.

Upvotes: 2

Related Questions