aju_k
aju_k

Reputation: 183

R delete rows based on values in previous rows

I am new to R and trying to delete rows based on values of previous rows. Sample data:

Cust_ID | Date                 | Value
500219  | 2016-04-11 12:00:00  | 0
500219  | 2016-04-12 16:00:00  | 0
500219  | 2016-04-14 11:00:00  | 1
500219  | 2016-04-15 12:00:00  | 1
500219  | 2016-05-23 09:00:00  | 0
500219  | 2016-05-02 19:00:00  | 0
500220  | 2016-04-11 12:00:00  | 0
500220  | 2016-04-14 11:00:00  | 1
500220  | 2016-04-15 12:00:00  | 1
500220  | 2016-05-23 09:00:00  | 0
500220  | 2016-05-02 19:00:00  | 0

I would like to maintain only the rows before Value = 1 for each Cust_ID giving the result:

Cust_ID | Date                 | Value
500219  | 2016-04-11 12:00:00  | 0
500219  | 2016-04-12 16:00:00  | 0
500219  | 2016-04-14 11:00:00  | 1
500219  | 2016-04-15 12:00:00  | 1
500220  | 2016-04-11 12:00:00  | 0
500220  | 2016-04-14 11:00:00  | 1
500220  | 2016-04-15 12:00:00  | 1

Any help would be appreciated!

Upvotes: 2

Views: 1670

Answers (3)

akrun
akrun

Reputation: 887153

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'Cust_ID', we get the sequence of max of indexes where 'Value' is 1, and get the row index (.I) and use that to subset the data.table rows.

library(data.table)
setDT(df1)[df1[,  if(any(Value == 1)) .I[seq(max(which(Value == 1)))]
                                 else .I[1:.N] , by = Cust_ID]$V1]
#      Cust_ID                Date Value
#1:  500219 2016-04-11 12:00:00     0
#2:  500219 2016-04-12 16:00:00     0
#3:  500219 2016-04-14 11:00:00     1
#4:  500219 2016-04-15 12:00:00     1
#5:  500220 2016-04-11 12:00:00     0
#6:  500220 2016-04-14 11:00:00     1
#7:  500220 2016-04-15 12:00:00     1

Or using a similar approach with dplyr

library(dplyr)
df1 %>% 
     group_by(Cust_ID) %>% 
     slice(if(any(Value==1)) seq(max(which(Value==1))) else row_number())
#   Cust_ID                Date Value
#     <int>               <chr> <int>
#1  500219 2016-04-11 12:00:00     0
#2  500219 2016-04-12 16:00:00     0
#3  500219 2016-04-14 11:00:00     1
#4  500219 2016-04-15 12:00:00     1
#5  500220 2016-04-11 12:00:00     0
#6  500220 2016-04-14 11:00:00     1
#7  500220 2016-04-15 12:00:00     1

Upvotes: 2

PhilC
PhilC

Reputation: 787

Looping approach:

cust <- 0
keep <- FALSE
keepers <- vector(mode = "logical", length = nrow(df))

## walk through the dataframe backwards
for(rec in nrow(df):1)
{
  ## have we been working with this customer?
  if(df[rec,]$Cust_ID == cust)
  {
    if(df[rec,]$Value == 1  | keep == TRUE)
    {
      keepers[rec] = TRUE
      keep <- TRUE
    }
  }
  else
  {
    cust = df[rec,]$Cust_ID
    if(df[rec,]$Value == 1)
    {
      keepers[rec] = TRUE
      keep <- TRUE
    }
    else
    {
      keep <- FALSE
    }
  }
}

df <- df[keepers,]
df

Upvotes: 0

lmo
lmo

Reputation: 38510

Here is a split-apply-combine method that keeps any values that are 1 as well as the values before the first 1 for each customer.

# split data by customer ID
myList <- split(df, df$Cust_ID)
# loop through ID list, drop desired rows, rbind resulting list
dfNew <- do.call(rbind, lapply(myList, function(i) {
                               drop <- which(i$Value==1)
                               i[c(1:drop[1], drop[-1]),]}))

which returns

dfNew
         Cust_ID                   Date Value
500219.1  500219  2016-04-11 12:00:00       0
500219.2  500219  2016-04-12 16:00:00       0
500219.3  500219  2016-04-14 11:00:00       1
500219.4  500219  2016-04-15 12:00:00       1
500220.7  500220  2016-04-11 12:00:00       0
500220.8  500220  2016-04-14 11:00:00       1
500220.9  500220  2016-04-15 12:00:00       1

Note that this solution will not work if there are customer IDs that never have a value equal to 1.


If you want to retain observations that never reach the 1 threshold, then use

dfNew <- do.call(rbind, lapply(myList, function(i) {
                               drop <- which(i$Value==1)
                               if(length(drop) != 0) i[c(1:drop[1], drop[-1]),]
                               else i}))

Upvotes: 2

Related Questions