Reputation: 15
I have a dataframe with group and time-id. Now I want to create a new count variable, called X2
, conditional on previous values of X1
within each group.
Suppose I have the following dataframe, variables group
, time
, X1
, and want to create X2
. The value of X2
should be a count variable indicating the number of periods (i.e., rows) since X1
has last been equal to 1 within the given group. If all previous values of X1
are 0, X2
should be missing.
group time X1 X2
1 1 1 0 NA
2 1 2 0 NA
3 1 3 1 NA
4 1 4 0 0
5 1 5 1 1
6 2 1 0 NA
7 2 2 1 NA
8 2 3 1 0
9 2 4 1 0
10 2 5 0 0
11 3 1 0 NA
12 3 2 0 NA
13 3 3 0 NA
14 3 4 1 NA
15 3 5 0 0
16 4 1 1 NA
17 4 2 0 0
18 4 3 0 1
19 4 4 0 2
20 4 5 1 3
data_test <- data.frame(
group = rep(1:4, each = 5),
time = rep(1:5, n = 4),
X1 = c(0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1),
X2 = c(NA, NA, NA, 0, 1, NA, NA, 0, 0, 0, NA, NA, NA, NA, 0, NA, 0, 1, 2, 3))
I could easily create the NAs by using an if command with cumsum()==0
. However, I have no idea how to approach the other part, i.e., create the count.
Any help would be much appreciated - if at all possible using dplyr
.
Thanks a lot.
Upvotes: 0
Views: 267
Reputation: 30474
This isn't very elegant, but it incorporates the rules provided I think.
A column is added for the row numbers where X1
is 1, and fill
used so you can use the most recent row number available for that condition.
To calculate X2_new
, if X1
is 0, it is the difference between the current row number within the group minus the last row number where X1
was 1. If X1
is 1 (and not the first X1
of 1 in the group), then do similar calculation but on the previous rn
reference.
library(tidyverse)
data_test %>%
group_by(group) %>%
mutate(rn = ifelse(X1 == 1, row_number(), NA)) %>%
fill(rn) %>%
mutate(X2_new = ifelse(X1 == 0 & row_number() > rn, row_number() - rn - 1, NA),
X2_new = ifelse(X1 == 1 & !is.na(lag(rn)), row_number() - lag(rn) - 1, X2_new)) %>%
select(-rn)
Output
# A tibble: 20 x 5
# Groups: group [4]
group time X1 X2 X2_new
<int> <int> <dbl> <dbl> <dbl>
1 1 1 0 NA NA
2 1 2 0 NA NA
3 1 3 1 NA NA
4 1 4 0 0 0
5 1 5 1 1 1
6 2 1 0 NA NA
7 2 2 1 NA NA
8 2 3 1 0 0
9 2 4 1 0 0
10 2 5 0 0 0
11 3 1 0 NA NA
12 3 2 0 NA NA
13 3 3 0 NA NA
14 3 4 1 NA NA
15 3 5 0 0 0
16 4 1 1 NA NA
17 4 2 0 0 0
18 4 3 0 1 1
19 4 4 0 2 2
20 4 5 1 3 3
Upvotes: 1
Reputation: 1
Using the pipe (%>%
), mutate
, group_by
, and lag
functions from the dplyr
package
library(dplyr)
data_test %>%
group_by(group) %>%
mutate(X2 = ifelse(lag(X1) == 1, 0, lag(X2) + 1))
This does assume the difference in time
will always be 1 as shown in your sample data. If time
varies in your data and X2
should reflect the difference in time
:
data_test %>%
group_by(group) %>%
mutate(X2 = ifelse(lag(X1) == 1, 0, lag(X2) + (time - lag(time))))
Upvotes: 0