Reputation: 327
I have a dataframe, let's call it DF, which I need to remove certain rows. The data frame is very similar to the following example.
|Symbol | Date | Volume |
|----------------------------|
|A |2014-01-01 | 0 |
|A |2014-01-02 | 0 |
|A |2014-01-03 | 0 |
|A |2014-01-04 | 1 |
|B |2014-01-01 |45 |
|B |2014-01-02 |0 |
|B |2014-01-03 |34 |
|B |2014-01-04 |45 |
For example, I need to delete all the rows containing the symbol A if in the column Volume there is an equal or greater than 365, continuous amount of zeros with respect to the date, that is, if for the symbol A there are 365 consecutive zeros in the box volume, if the data are organized chronologically, then I must delete all the rows containing the symbol A; and so on for all the symbols in that column.I hope to make myself understood.
I also need to count how many and which symbols I must eliminate under that criterion and I have not yet succeeded. I have tried with this line of code:
count(which(sum(DF$Volume==0)>365))
Upvotes: 0
Views: 340
Reputation: 146249
With dplyr
and the rleid
helper function from data.table
, we will add a grouping ID for consecutive values within a Symbol
group, and count the rows within that grouping:
library(dplyr)
DF <- DF %>%
group_by(Symbol) %>%
mutate(consec_id = data.table::rleid(Volume)) %>%
group_by(Symbol, consec_id) %>%
mutate(n_consec = n())
# count how many to be eliminated
DF %>% group_by(symbol) %>%
summarize(sum(ifelse(any(n_consec > 365 & Volume == 0), 1, 0)))
# actually removing them
DF %>% group_by(symbol) %>%
filter(!any(n_consec > 365 & Volume == 0))
Upvotes: 2