Reputation: 294

Modify variables in longitudinal data sets (keep first appearance of values on person-level)

I have a dataframe:

i <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
t <- c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
x <- c(0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1)
y <- c(5, 6, 7, 8, 4, 5, 6, 7, 6, 7, 8, 8)
j1 <- c(NA, NA, NA, NA, NA, 5, NA, 7, NA, NA, 8, 8)

dat <- data.frame(i, t, x, y, j1)
dat

   i t x y j1
1  1 1 0 5 NA
2  1 2 0 6 NA
3  1 3 0 7 NA
4  1 4 0 8 NA
5  2 1 0 4 NA
6  2 2 1 5  5
7  2 3 0 6 NA
8  2 4 1 7  7
9  3 1 0 6 NA
10 3 2 0 7 NA
11 3 3 1 8  8
12 3 4 1 9  8

The dataframe refers to 3 persons "i" at 4 points in time "t". "j1" switches to "y" when "x" turns from 0 to 1 for a person "i". While "x" stays on 1 for a person, "j1" does not change within time (see person 3). When "x" is 0, "j1" is always NA.

Now I want to add a new variable "j2" to the dataframe which is a modification of "j1". The difference should be the following: For each person "i", there should be only one value for "j2". Namely, it should be the first value for "j1" for each person (the first change from 0 to 1 in "x").

Accordingly, the result should look like this:

dat

   i t x y j1 j2
1  1 1 0 5 NA NA
2  1 2 0 6 NA NA
3  1 3 0 7 NA NA
4  1 4 0 8 NA NA
5  2 1 0 4 NA NA
6  2 2 1 5  5 5
7  2 3 0 6 NA NA
8  2 4 1 7  7 NA
9  3 1 0 6 NA NA
10 3 2 0 7 NA NA
11 3 3 1 8  8 8
12 3 4 1 9  8 NA

I appreciate suggestions on how to address this with dplyr

Upvotes: 1

Answers (4)

GuedesBF

Reputation: 9858

Option1

You can use dplyr with mutate, use j1 and replace()the values for which both the current and the previous (lag()) value are non-NA with NAs:

library(dplyr)

dat %>% group_by(i) %>%
        mutate(j2=replace(j1, !is.na(j1) & !is.na(lag(j1)), NA))

Option2

You can use replace() and replace all values in j1 which are not the first non-NA value (which(!is.na(j1))[1]).

dat %>% group_by(i) %>%
        mutate(j2=replace(j1, which(!is.na(j1))[1], NA))

Option3

You can use purrr::accumulate() too. Call accumulate comparing consecutive (.x, .y) values form the j1 vector. If they are the same, the output will be NA.

library(dplyr)

dat %>% group_by(i) %>%
        mutate(j2=purrr::accumulate(j1, ~ifelse(.x %in% .y, NA, .y)))

Output

# A tibble: 12 x 6
# Groups:   i [3]
       i     t     x     y    j1    j2
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     1     0     5    NA    NA
 2     1     2     0     6    NA    NA
 3     1     3     0     7    NA    NA
 4     1     4     0     8    NA    NA
 5     2     1     0     4    NA    NA
 6     2     2     1     5     5     5
 7     2     3     0     6    NA    NA
 8     2     4     1     7     7     7
 9     3     1     0     6    NA    NA
10     3     2     0     7    NA    NA
11     3     3     1     8     8     8
12     3     4     1     8     8    NA

Upvotes: 2

shs

Reputation: 3901

Somewhat more concise than the others:

library(tidyverse)

dat <- structure(list(i = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3), t = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), x = c(0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1), y = c(5, 6, 7, 8, 4, 5, 6, 7, 6, 7, 8, 8), j1 = c(NA,  NA, NA, NA, NA, 5, NA, 7, NA, NA, 8, 8)), class = "data.frame", row.names = c(NA, -12L))

dat %>%  
  group_by(i) %>% 
  mutate(j2 = ifelse(1:n() == which(x == 1)[1], y, NA)) %>% 
  ungroup()
#> # A tibble: 12 × 6
#>        i     t     x     y    j1    j2
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     1     1     0     5    NA    NA
#>  2     1     2     0     6    NA    NA
#>  3     1     3     0     7    NA    NA
#>  4     1     4     0     8    NA    NA
#>  5     2     1     0     4    NA    NA
#>  6     2     2     1     5     5     5
#>  7     2     3     0     6    NA    NA
#>  8     2     4     1     7     7    NA
#>  9     3     1     0     6    NA    NA
#> 10     3     2     0     7    NA    NA
#> 11     3     3     1     8     8     8
#> 12     3     4     1     8     8    NA

Upvotes: 2

det

Reputation: 5232

Function f puts NA after first value that is not NA in vector x. FUnction f is applied to j1 for each group determined by i.

f <- function(x){
  
  ind <- which(!is.na(x))[1]
  if(is.na(ind) || ind == length(x)) return(x)
  
  x[(which.min(is.na(x))+1):length(x)] <- NA
  x
}
  
dat %>%
  group_by(i) %>%
  mutate(j2 = f(j1)) %>%
  ungroup()

Upvotes: 2

Yuriy Saraykin

Reputation: 8880

possible solution

library(tidyverse)
i <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
t <- c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
x <- c(0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1)
y <- c(5, 6, 7, 8, 4, 5, 6, 7, 6, 7, 8, 8)
j1 <- c(NA, NA, NA, NA, NA, 5, NA, 7, NA, NA, 8, 8)

df <- data.frame(i, t, x, y, j1)

tmp <- df %>% 
  filter(x == 1) %>% 
  group_by(i) %>% 
  slice(1) %>% 
  ungroup() %>% 
  rename(j2 = j1)

left_join(df, tmp)
#> Joining, by = c("i", "t", "x", "y")
#>    i t x y j1 j2
#> 1  1 1 0 5 NA NA
#> 2  1 2 0 6 NA NA
#> 3  1 3 0 7 NA NA
#> 4  1 4 0 8 NA NA
#> 5  2 1 0 4 NA NA
#> 6  2 2 1 5  5  5
#> 7  2 3 0 6 NA NA
#> 8  2 4 1 7  7 NA
#> 9  3 1 0 6 NA NA
#> 10 3 2 0 7 NA NA
#> 11 3 3 1 8  8  8
#> 12 3 4 1 8  8 NA

^{Created on 2021-09-08 by the reprex package (v2.0.1)}

Upvotes: 2

Modify variables in longitudinal data sets (keep first appearance of values on person-level)

Answers (4)

Related Questions