Valeria Arango
Valeria Arango

Reputation: 327

Delete all rows with a certain consecutive value in a given column in R

I have a dataframe, let's call it DF, which I need to remove certain rows. The data frame is very similar to the following example.

|Symbol |   Date    | Volume |
|----------------------------|
|A      |2014-01-01 | 0      |
|A      |2014-01-02 | 0      |
|A      |2014-01-03 | 0      |
|A      |2014-01-04 | 1      |
|B      |2014-01-01 |45      |
|B      |2014-01-02 |0       |
|B      |2014-01-03 |34      |
|B      |2014-01-04 |45      |

For example, I need to delete all the rows containing the symbol A if in the column Volume there is an equal or greater than 365, continuous amount of zeros with respect to the date, that is, if for the symbol A there are 365 consecutive zeros in the box volume, if the data are organized chronologically, then I must delete all the rows containing the symbol A; and so on for all the symbols in that column.I hope to make myself understood.

I also need to count how many and which symbols I must eliminate under that criterion and I have not yet succeeded. I have tried with this line of code:

count(which(sum(DF$Volume==0)>365))

Upvotes: 0

Views: 340

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146249

With dplyr and the rleid helper function from data.table, we will add a grouping ID for consecutive values within a Symbol group, and count the rows within that grouping:

library(dplyr)
DF <- DF %>% 
  group_by(Symbol) %>%
  mutate(consec_id = data.table::rleid(Volume)) %>%
  group_by(Symbol, consec_id) %>%
  mutate(n_consec = n())

# count how many to be eliminated
DF %>% group_by(symbol) %>%
  summarize(sum(ifelse(any(n_consec > 365 & Volume == 0), 1, 0)))

# actually removing them
DF %>% group_by(symbol) %>%
 filter(!any(n_consec > 365 & Volume == 0))

Upvotes: 2

Related Questions