C.F.
C.F.

Reputation: 294

Generating new variables for changes over time in longitudinal data with dplyr

I have following dataframe:

t <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
i <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
x <- c(3, 3, 2, 0, NA, NA, 1, 3, NA)
y <- c(0, 0, 0, 0, 1, 1, 0, 0, 1)

df <- data.frame(t, i, x, y)
df

> df
  t i  x y
1 1 1  3 0
2 2 1  3 0
3 3 1  2 0
4 1 2  0 0
5 2 2 NA 1
6 3 2 NA 1
7 1 3  1 0
8 2 3  3 0
9 3 3 NA 1

It is some example longitudinal data with 3 points in time (t), 3 objects (i) and two variables x and y for the 3 objects i. Note that x is always NA, when y equals 1. I want to generate a new variable z, which duplicates x; but with one difference: NA's in x should become the last available value from x. That means, NA's in i = 2 both become 0 and NA in i = 3 becomes 3. The result is following dataframe:

  t i  x y z
1 1 1  3 0 3
2 2 1  3 0 3
3 3 1  2 0 2
4 1 2  0 0 0
5 2 2 NA 1 0
6 3 2 NA 1 0
7 1 3  1 0 1
8 2 3  3 0 3
9 3 3 NA 1 3

In a last step i want to change the new variable z to NA when y is always 0 for an object i:

  t i  x y  z
1 1 1  3 0 NA
2 2 1  3 0 NA
3 3 1  2 0 NA
4 1 2  0 0  0
5 2 2 NA 1  0
6 3 2 NA 1  0
7 1 3  1 0  1
8 2 3  3 0  3
9 3 3 NA 1  3

So in the end, the new variable z only contains the information on x for changes in y on the level of i. However, it is important to me, that the last step (generating NA's in z) is implemented separately. I appreciate any advice on how to do this in dplyr.

Upvotes: 1

Views: 409

Answers (2)

akrun
akrun

Reputation: 887193

We may use na.locf0 from zoo

library(dplyr)
library(zoo)
df %>%
      group_by(i) %>%
      mutate(z = na.locf0(x) * NA^(all(y == 0))) %>%
      ungroup
# A tibble: 9 x 5
      t     i     x     y     z
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     3     0    NA
2     2     1     3     0    NA
3     3     1     2     0    NA
4     1     2     0     0     0
5     2     2    NA     1     0
6     3     2    NA     1     0
7     1     3     1     0     1
8     2     3     3     0     3
9     3     3    NA     1     3

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388992

You can use fill to replace the NA values in z to last available value and replace z with NA if all y values in i is 0.

library(dplyr)
library(tidyr)

df %>%
  mutate(z = x) %>%
  group_by(i) %>%
  fill(z) %>%
  mutate(z = if(all(y == 0)) NA else z) %>%
  ungroup

#      t     i     x     y     z
#  <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1     1     3     0    NA
#2     2     1     3     0    NA
#3     3     1     2     0    NA
#4     1     2     0     0     0
#5     2     2    NA     1     0
#6     3     2    NA     1     0
#7     1     3     1     0     1
#8     2     3     3     0     3
#9     3     3    NA     1     3 

Upvotes: 1

Related Questions