Zizou
Zizou

Reputation: 503

Select last non-zero value before each run of zero, by group

I have the following data frame:

   variable       Date values values2
1         a 2017-01-01      3       1
2         a 2017-01-02      4       2
3         a 2017-01-03      5       1 # non-zero followed by zero
4         a 2017-01-04      0       2
5         a 2017-01-05      0       2
6         a 2017-01-06      0       3
7         b 2017-01-01     10       1
8         b 2017-01-02     11       2
9         b 2017-01-03     12       2
10        b 2017-01-04     13       3
11        b 2017-01-05     14       0
12        b 2017-01-06     15       1
13        c 2017-01-01     45       2
14        c 2017-01-02     50       3 # non-zero followed by zero
15        c 2017-01-03      0       0
16        c 2017-01-04      0       2
17        c 2017-01-05     10       1 # non-zero followed by zero
18        c 2017-01-06      0       1

I want select the rows where a non-zero value is followed by a zero in the "values" column:

1 a        2017-01-03      5       1
2 c        2017-01-02     50       3
3 c        2017-01-05     10       1

Please note that more than one last non-zero value may appear in one variable.

My code does not work correctly, because there is only the last element for each variable:

test_df <- structure(list(variable = c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c","c"), 
                          Date = structure(c(17167, 17168, 17169, 17170, 17171, 17172, 17167, 17168, 17169, 17170, 17171, 17172, 17167, 17168, 17169, 17170, 17171, 17172), class = "Date"), 
                          values = c(3, 4, 5, 0, 0, 0, 10, 11, 12, 13, 14, 15, 45, 50,0, 0, 10, 0),
                          values2 = c(1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 0, 1, 2, 3, 0, 2, 1, 1)), 
                          row.names = c(NA, -18L), class = "data.frame", .Names = c("variable", "Date", "values", "values2"))

 test_df %>% 
  group_by(variable=factor(variable)) %>% 
  filter(any(values==0)) %>% 
  filter(values != 0) %>% 
  arrange(Date) %>% 
  slice(n()) %>% 
  ungroup() 

1 a        2017-01-03      5       1
2 c        2017-01-05     10       1

Upvotes: 4

Views: 664

Answers (4)

Anoushiravan R
Anoushiravan R

Reputation: 21938

This solution could also help you with this:

library(dplyr)

test_df %>%
  group_by(variable) %>%
  filter(values != 0 & lead(values) == 0)

  variable       Date values values2
1        a 2017-01-03      5       1
2        c 2017-01-02     50       3
3        c 2017-01-05     10       1

Upvotes: 10

Ronak Shah
Ronak Shah

Reputation: 389325

For completion here is data.table answer -

library(data.table)
setDT(test_df)[, .SD[values != 0 & shift(values, type = 'lead') == 0], variable]

#   variable       Date values values2
#1:        a 2017-01-03      5       1
#2:        c 2017-01-02     50       3
#3:        c 2017-01-05     10       1

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 102770

Another base R option using diff

> subset(
+   test_df,
+   ave(values == 0, variable, FUN = function(x) c(diff(x) == 1, FALSE))
+ )
   variable       Date values values2
3         a 2017-01-03      5       1
14        c 2017-01-02     50       3
17        c 2017-01-05     10       1

Upvotes: 2

akrun
akrun

Reputation: 887951

Using subset from base R

subset(test_df, head(values, -1) != 0 &  tail(values, -1) == 0)
   variable       Date values values2
3         a 2017-01-03      5       1
14        c 2017-01-02     50       3
17        c 2017-01-05     10       1

If it is grouped, just wrap this in ave

subset(test_df,  ave(values, variable, FUN = function(x) 
     c(head(x, -1)  != 0 & tail(x, -1) == 0, FALSE))> 0)
 variable       Date values values2
3         a 2017-01-03      5       1
14        c 2017-01-02     50       3
17        c 2017-01-05     10       1

Upvotes: 3

Related Questions