Reputation: 1773
I want to create a new column called age based on a previous column and group by functions. The dataset is as follows:
tid<- c(1,2,3,4, 1,2,3,4,1,2,3,4)
active<- c(0,1,0,4, 0,0,0,1,0,0,1,0)
person<- c('John', 'John','John', 'John', 'Emma', 'Emma','Emma','Emma', 'Edward', 'Edward', 'Edward', 'Edward')
df<- data.frame(tid, active, person)
I want to create age which starts with 0 when the person is first active i.e., the value of active becomes greater than 0 for first time and then incrementally gets added with one value for the next record. Any suggestions?
I am expecting the output as follows:
name age
John 0
John 0
John 1
John 2
Emma 0
Emma 0
Emma 0
Emma 0
Edward 0
Edward 0
Edward 0
Edward 1
Upvotes: 0
Views: 2571
Reputation: 31
An alternative solution that also does the job:
library(tidyverse)
age_counter = df %>%
arrange(tid) %>%
group_by(person) %>%
filter(cumsum(active) > 0) %>%
mutate(age = row_number() - 1)
df %>%
left_join(age_counter) %>%
replace_na(list(age = 0)) %>%
select(person, age)
Upvotes: 1
Reputation: 1950
Does this solve it for you?
library(dplyr)
df %>%
group_by(person) %>%
arrange(person, tid) %>%
mutate(active_dummy = if_else(lag(cumsum(active)) > 0, 1, 0, 0),
age = cumsum(active_dummy)) %>%
select(person, age)
which gives you
# A tibble: 12 x 2
# Groups: person [3]
person age
<chr> <dbl>
1 John 0.
2 John 0.
3 John 1.
4 John 2.
5 Emma 0.
6 Emma 0.
7 Emma 0.
8 Emma 0.
9 Edward 0.
10 Edward 0.
11 Edward 0.
12 Edward 1.
Upvotes: 4