cahill_598
cahill_598

Reputation: 37

Fill in column based on previous value in R

I am looking for a tidyverse solution to the following problem.

I have the following example dataset:

ex_tbl <- tibble(col1 = c(5, NA, NA, NA), col2 = c(1, 1.3, 0.8, 1.2))

I need to fill in the NAs based on col2 and the previous value in the col1. If I start with the first NA (col1, row 2), and col2 is greater than 1, then I'd multiply the previous col1 value (row 1) by col2 row 2. Then if col2 is less than or equal to 1, then col1 is just equal to the previous value.

This should be the final result:

  col1  col2
     5   1  
    6.5  1.3
    6.5  0.8
    7.8  1.2

I tried case_when() but am not getting the desired result:

output <- ex_tbl %>% mutate(col1 = case_when(col2 > 1 ~ col1 * col2,
                                              col2 <= 1 ~ col1,
                                              TRUE ~ col1)

Clearly, I'm not capturing the part where I want to use the previous row's value. Any help would be appreciated.

Upvotes: 2

Views: 848

Answers (2)

Martin Gal
Martin Gal

Reputation: 16998

Perhaps you could use

library(dplyr)

ex_tbl %>% 
  mutate(col1 = first(col1) * cumprod(pmax(col2, 1)))

to get

# A tibble: 4 x 2
   col1  col2
  <dbl> <dbl>
1   5     1  
2   6.5   1.3
3   6.5   0.8
4   7.8   1.2

Edit: Follow-up question

If you want to multiply by a value from a third column in case of col2 < 1, you could use:

ex_tbl <- tibble(col1 = c(5, NA, NA, NA), col2 = c(1, 1.3, 0.8, 1.2), col3 = c(0.5, 2, 2, 0.3))

ex_tbl %>% 
  mutate(
    factor = (col2 >= 1) * col2 + (col2 < 1) * col3,
    col1 = first(col1) * cumprod((col2 >= 1) * col2 + (col2 < 1) * col3))

Note: the column factor here isn't necessary. It's just to show what is happening inside the cumprod function.

Upvotes: 2

akrun
akrun

Reputation: 887881

We may need accumulate here

library(dplyr)
library(purrr)
ex_tbl %>% 
   mutate(coln = accumulate(col2, ~  if(.y <= 1) .x else .x * .y , 
         .init = first(col1))[-1])

-output

# A tibble: 4 × 3
   col1  col2  coln
  <dbl> <dbl> <dbl>
1     5   1     5  
2    NA   1.3   6.5
3    NA   0.8   6.5
4    NA   1.2   7.8

Upvotes: 1

Related Questions