Remove rows after a group by based on a column value

Question

I have a data set of different IDs, Date, Quantities. All IDs have the same exact start date, however some have starting quantities of 0 so basically the start date should be later.

I am trying to filter the rows such that each ID

df <- data.frame(ID = c("1", "1", "1", "1", "1", "1",
                        "2", "2", "2", "2", "2", "2",
                        "3", "3", "3", "3", "3", "3"),
                 Date = c(seq(as.Date("2000/1/1"), by = "month", length.out = 6),
                          seq(as.Date("2000/1/1"), by = "month", length.out = 6),
                          seq(as.Date("2000/1/1"), by = "month", length.out = 6)),
                 QTY = c(0, 0, 0, 40, 0, 60, 50, 55, 0, 70, 88, 64, 0, 43, 43, 0, 99, 99))

So basically in the df example above, I want ID number 1 to start from the fourth QTY value, and for ID number 3 to start from the second QTY value.

I tried using group by and to mutate a new column with an ifelse but I had some error.

My expected df should be a transformation from the first one to the one below, so just to remove the first rows with zeroes for each ID until QTY starts to be more than 0

df <- data.frame(ID = c("1", "1", "1",
                        "2", "2", "2", "2", "2", "2",
                        "3", "3", "3", "3", "3"),
                 Date = c(seq(as.Date("2000/4/1"), by = "month", length.out = 3),
                          seq(as.Date("2000/1/1"), by = "month", length.out = 6),
                          seq(as.Date("2000/2/1"), by = "month", length.out = 5)),
                 QTY = c(40, 0, 60, 50, 55, 0, 70, 88, 64, 43, 43, 0, 99, 99))

arg0naut91 · Accepted Answer

If the data is as in your example (i.e. no NA or negative values), just this base solution would suffice:

df[with(df, ave(QTY, ID, FUN = cumsum)) > 0, ]

Or subset(df, ave(QTY, ID, FUN = cumsum) > 0), but likely a tiny bit slower.

Output:

   ID       Date QTY
4   1 2000-04-01  40
5   1 2000-05-01   0
6   1 2000-06-01  60
7   2 2000-01-01  50
8   2 2000-02-01  55
9   2 2000-03-01   0
10  2 2000-04-01  70
11  2 2000-05-01  88
12  2 2000-06-01  64
14  3 2000-02-01  43
15  3 2000-03-01  43
16  3 2000-04-01   0
17  3 2000-05-01  99
18  3 2000-06-01  99

This is the same as in dplyr:

library(dplyr)

df %>%
  group_by(ID) %>%
  filter(cumsum(QTY) > 0)

Remove rows after a group by based on a column value

Answers (2)

Related Questions