Gebruiker10
Gebruiker10

Reputation: 13

Creating a new column with conditions in addition to the row value of the new column

Any ideas on how to create a new column B using the values of column A, while using the value of the row above of the new created colum B?

The value of B should be corresponding to:

A0 = value of the row above.
A1 = 1.
A2 = value of the row above + 1.

Current dataframe + desired outcome

                
Dataframe           Desired outcome  
A                   A   B
1                   1   1
0                   0   1
2                   2   2
0                   0   2
2                   2   3
0                   0   3
2                   2   4
0                   0   4
2                   2   5
0                   0   5
2                   2   6
0                   0   6
1                   1   1
0                   0   1
1                   1   1
0                   0   1
2                   2   2
0                   0   2
2                   2   3
0                   0   3
1                   1   1
0                   0   1
2                   2   2
0                   0   2


Data Frame
A <- c(1,0,2,0,2,0,2,0,2,0,2,0,1,0,1,0,2,0,2,0,1,0,2,0)
Bdesiredoutcome <- c(1,1,2,2,3,3,4,4,5,5,6,6,1,1,1,1,2,2,3,3,1,1,2,2)
df = data.frame(A,Bdesiredoutcome)

I tried using dpylr, mutate(), case_when() and lag() but keep running into errors. Due to using the lag() function. When using lag(A) the desired outcome cannot be generated. Any idea's on how to solve this problem?

df <- df %>%
          mutate(B = case_when((A == 0) ~ lag(B), 
                               (A == 1) ~ 1,
                               (A == 2) ~ (lag(B)+1)
    ))

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "function"
In addition: Warning message:

Upvotes: 1

Views: 65

Answers (2)

Marcio Bernardo
Marcio Bernardo

Reputation: 73

On your original question I got the following:

library(tidyverse)
library(lubridate)

df$date <-dmy(df$date)


df <- df %>% 
  arrange(id, date) %>%
  group_by(id) %>%
  mutate(daysbetween = replace_na(date - lag(date),0),
         ind = 1,
         NewA= case_when (daysbetween < 7 ~ 0, daysbetween > 7 ~ 1),
         NewB= case_when (daysbetween < 85 ~ 0, daysbetween > 85 ~ 1),
         A =   case_when (1 + cumsum(ind*NewA) <= 6 ~ 1 + cumsum(ind*NewA),
                          1 + cumsum(ind*NewA) > 6 ~ 1 + cumsum(ind*NewA) - 6),
         B =  1 + cumsum(ind*NewB))%>%
  select(id, date, A, B)

It only works if the reset for A is at 6. I used cumsum() as suggested above.

Upvotes: 0

akrun
akrun

Reputation: 887721

We can create a grouping column with cumsum and then create the 'B' column

library(dplyr)
df %>% 
   group_by(grp = cumsum(A == 1)) %>% 
   mutate(B = cumsum(A != 0)) %>% 
   ungroup %>%
   select(-grp) %>%
   as.data.frame

-output

   A Bdesired B
1  1        1 1
2  0        1 1
3  2        2 2
4  0        2 2
5  2        3 3
6  0        3 3
7  2        4 4
8  0        4 4
9  2        5 5
10 0        5 5
11 2        6 6
12 0        6 6
13 1        1 1
14 0        1 1
15 1        1 1
16 0        1 1
17 2        2 2
18 0        2 2
19 2        3 3
20 0        3 3
21 1        1 1
22 0        1 1
23 2        2 2
24 0        2 2

Upvotes: 1

Related Questions