MackMalc
MackMalc

Reputation: 21

R (dplyr) - add a column in which each value depends on its own previous value

Column c should turn TRUE when measurement exceeds 0.1, and remain TRUE for as long as measurement remains above 0.01 (this means it is allowed to fall to 0.05, for example). If it falls below 0.01, it should turn FALSE. It then turns TRUE again only if it exceeds 0.1 again at some point, and so on.

I am currently using base R's for loop because the value of c depends on it's previous value. While this works correctly, it is extremely slow when applied to my full data set. I would like to reproduce column df$c in a more efficient way.

Minimal reproducible example below.

measurement <- c(0.05, 0.001, 0.003, 0.1, 0.12, 0.13, 0.05, 0.03, 0.02, 0.005, 0.005, 0.006, 0.08, 0.12, 0.02, 0.00065)

df<- measurement %>% as.data.frame()

df %<>% mutate(a = ifelse(measurement >= 0.01 & measurement < 0.1, TRUE, FALSE), 
               b = ifelse(measurement >= 0.1, TRUE, FALSE)) %>% 
  mutate(c=b)  # initialise c

# df$c depends on its previous value and current values of a and b:
for (i in 2:nrow(df)) df$c[i] <- ifelse(df$b[i], TRUE, ifelse( df$c[i-1] & df$a[i], TRUE, FALSE))

df

df looks like this

         .     a     b     c
1  0.05000  TRUE FALSE FALSE
2  0.00100 FALSE FALSE FALSE
3  0.00300 FALSE FALSE FALSE
4  0.10000 FALSE  TRUE  TRUE
5  0.12000 FALSE  TRUE  TRUE
6  0.13000 FALSE  TRUE  TRUE
7  0.05000  TRUE FALSE  TRUE
8  0.03000  TRUE FALSE  TRUE
9  0.02000  TRUE FALSE  TRUE
10 0.00500 FALSE FALSE FALSE
11 0.00500 FALSE FALSE FALSE
12 0.00600 FALSE FALSE FALSE
13 0.08000  TRUE FALSE FALSE
14 0.12000 FALSE  TRUE  TRUE
15 0.02000  TRUE FALSE  TRUE
16 0.00065 FALSE FALSE FALSE 

Upvotes: 1

Views: 55

Answers (1)

TarJae
TarJae

Reputation: 78927

We could do something like this:

library(dplyr)

df %>% 
  mutate(x = case_when(between(., 0.01, 0.1) ~ "a",
                       . >= 0.1 ~ "b", 
                       TRUE ~ "c")) %>% 
  cbind(model.matrix(~ x + 0, .) == 1)
         . x    xa    xb    xc
1  0.05000 a  TRUE FALSE FALSE
2  0.00100 c FALSE FALSE  TRUE
3  0.00300 c FALSE FALSE  TRUE
4  0.10000 a  TRUE FALSE FALSE
5  0.12000 b FALSE  TRUE FALSE
6  0.13000 b FALSE  TRUE FALSE
7  0.05000 a  TRUE FALSE FALSE
8  0.03000 a  TRUE FALSE FALSE
9  0.02000 a  TRUE FALSE FALSE
10 0.00500 c FALSE FALSE  TRUE
11 0.00500 c FALSE FALSE  TRUE
12 0.00600 c FALSE FALSE  TRUE
13 0.08000 a  TRUE FALSE FALSE
14 0.12000 b FALSE  TRUE FALSE
15 0.02000 a  TRUE FALSE FALSE
16 0.00065 c FALSE FALSE  TRUE

Upvotes: 2

Related Questions