Stat-R
Stat-R

Reputation: 5270

R: Tidyverse way of updating a value of a column at change points (What's wrong?)

I want to update a value of a column if it value changes. For example, in the following data, I would like to create column grp based on value column which is a binary variable signifying a change point. I tried to attempt it by creating temp1 but the result is not what I want.

library(tidyverse)
as_tibble(c(1,0,0,0,1,0,1,0)) %>%
      mutate(temp1 = 1,
        lag_temp1 = lag(temp1,1,default = 1),
        temp1 =  ifelse(row_number() ==1,1,value + lag_temp1)) %>%   
      mutate(grp = c(1,1,1,1,2,2,3,3)) %>% 
      print

    # A tibble: 8 x 4
      value temp1 lag_temp1   grp
      <dbl> <dbl>     <dbl> <dbl>
    1     1     1         1     1
    2     0     1         1     1
    3     0     1         1     1
    4     0     1         1     1
    5     1     2         1     2
    6     0     1         1     2
    7     1     2         1     3
    8     0     1         1     3

Update

Apart from getting the grp correctly, I am also seeking to know why my solution did not work. I have used similar logic in other places in my data analysis. It would be very beneficial for me to know where is the mistake? Apart from inbuilt cumsum I may have to use other functions at times.

Upvotes: 0

Views: 462

Answers (1)

markus
markus

Reputation: 26343

To get the grp variable right we can use cumsum

library(tidyverse)
as_tibble(c(1, 0, 0, 0, 1, 0, 1, 0)) %>% 
  mutate(grp = cumsum(value))
 # A tibble: 8 x 2
#  value   grp
#  <dbl> <dbl>
#1     1     1
#2     0     1
#3     0     1
#4     0     1
#5     1     2
#6     0     2
#7     1     3
#8     0     3

In your solution there is no difference between temp1 and lag_temp1 in the first place:

as_tibble(c(1,0,0,0,1,0,1,0)) %>%
  mutate(temp1 = 1,
         lag_temp1 = lag(temp1, 1, default = 1))

So in the end temp1 is simply c(value[1], value[-1] + 1).

It is not entirely clear to me what is meant by "Apart from inbuilt cumsum I may have to use other functions at times." - because this depends on the specific case. For the above example cumsum does the job.

Upvotes: 1

Related Questions