Reputation: 355

Set value to 0 if any of the remaining values is 0

I have a data.frame like this:

dat <- data.frame("ID"=c(rep(1,13),rep(2,5)), "time"=c(seq(1,13),c(seq(1,5))), "value"=c(rep(0,5), rep(1,3), 2, 0, 1, 5, 20, rep(0,2), seq(1:3)))
   ID time value
1   1    1     0
2   1    2     0
3   1    3     0
4   1    4     0
5   1    5     0
6   1    6     1
7   1    7     1
8   1    8     1
9   1    9     2
10  1   10     0
11  1   11     1
12  1   12     5
13  1   13    20
14  2    1     0
15  2    2     0
16  2    3     1
17  2    4     2
18  2    5     3

My goal is to set all values to 0, if among the remaining values there is any other 0 (for each unique ID and sorted by time). That means in the example data, I would like to have 0 in the rows 6:9.

I tried dat %>% group_by(ID) %>% mutate(value2 = ifelse(lead(value, order_by=time)==0, 0, value)) but I would have to run this several times, since it only changes one row at a time (i.e. row 9 first, then row 8, etc.).

dplyr solution would be prefered but I'd take everything that works :)

Short explanation: value is the size of a tumor. If the tumor does not grow large, but actually vanishes completely at a later time, it was most likely an irrelevant encapsulation, hence should be coded as "zero tumor".

Upvotes: 3

Answers (4)

Darren Tsai

Reputation: 35624

You can use accumulate(..., .dir = "backward") in purrr

library(dplyr)
library(purrr)

dat %>%
  group_by(ID) %>%
  arrange(time, .by_group = T) %>%
  mutate(value2 = accumulate(value, ~ if(.y == 0) 0 else .x, .dir = "backward")) %>%
  ungroup()

# A tibble: 18 x 4
      ID  time value value2
   <dbl> <int> <dbl>  <dbl>
 1     1     1     0      0
 2     1     2     0      0
 3     1     3     0      0
 4     1     4     0      0
 5     1     5     0      0
 6     1     6     1      0
 7     1     7     1      0
 8     1     8     1      0
 9     1     9     2      0
10     1    10     0      0
11     1    11     1      1
12     1    12     5      5
13     1    13    20     20
14     2     1     0      0
15     2     2     0      0
16     2     3     1      1
17     2     4     2      2
18     2     5     3      3

Upvotes: 0

IceCreamToucan

Reputation: 28705

With data.table you can caluculate fields with the data in a certain order, without actually reordering the data frame. Useful here

library(data.table)
setDT(dat)

dat[order(-time), value := fifelse(cumsum(value == 0) > 0, 0, value), ID]

dat
#     ID time value
#  1:  1    1     0
#  2:  1    2     0
#  3:  1    3     0
#  4:  1    4     0
#  5:  1    5     0
#  6:  1    6     0
#  7:  1    7     0
#  8:  1    8     0
#  9:  1    9     0
# 10:  1   10     0
# 11:  1   11     1
# 12:  1   12     5
# 13:  1   13    20
# 14:  2    1     0
# 15:  2    2     0
# 16:  2    3     1
# 17:  2    4     2
# 18:  2    5     3

Upvotes: 1

Cettt

Reputation: 11981

I am not sure wether this is your desired output, but maybe it can be usefull to you

dat %>%
  group_by(ID) %>%
  arrange(-time) %>%
  mutate(value = if_else(cumsum(value == 0) > 0, 0, value)) %>%
  arrange(ID, time)

     ID  time value
   <dbl> <int> <dbl>
 1     1     1     0
 2     1     2     0
 3     1     3     0
 4     1     4     0
 5     1     5     0
 6     1     6     0
 7     1     7     0
 8     1     8     0
 9     1     9     0
10     1    10     0
11     1    11     1
12     1    12     5
13     1    13    20
14     2     1     0
15     2     2     0
16     2     3     1
17     2     4     2
18     2     5     3

Basicalyl, I first put the observations in descending order. Then I check whether there has been a zero in value (cumsum(value == 0) > 0)). If yes, I set all remaining values to zero. Finally, I put the observations in correct order again.

If you do not want to order and reorder the data you can use the following code, which relies on the same logic but is a bit more difficult to read:

dat %>%
  group_by(ID) %>%
  arrange(ID, time) %>%
  mutate(value = if_else(cumsum(value == 0) < sum(value == 0), 0, value))

Or a bit more efficient without if_else:

dat %>%
  group_by(ID) %>%
  arrange(ID, time) %>%
  mutate(value = value * (cumsum(value == 0) >= sum(value == 0)))

Upvotes: 5

Iroha

Reputation: 34751

One way could be to find the indices of the first and last occurrences of 0 and replace everything in between.

library(dplyr)

dat %>%
  group_by(ID) %>%
  mutate(value = replace(value, between(row_number(), which.max(value == 0), tail(which(value == 0), 1)), 0))

# A tibble: 18 x 3
# Groups:   ID [2]
      ID  time value
   <dbl> <int> <dbl>
 1     1     1     0
 2     1     2     0
 3     1     3     0
 4     1     4     0
 5     1     5     0
 6     1     6     0
 7     1     7     0
 8     1     8     0
 9     1     9     0
10     1    10     0
11     1    11     1
12     1    12     5
13     1    13    20
14     2     1     0
15     2     2     0
16     2     3     1
17     2     4     2
18     2     5     3

Upvotes: 2

Set value to 0 if any of the remaining values is 0

Answers (4)

Related Questions