Reputation: 125
I have the following data frame df
with longitudinal data in long format (see below). I would like to create a new variable called new.var
based on some conditions. This new variable should be 1, if the first value of postLin for a person is > 0 but <= 1 or if the the value of postLin == 0 and the value of preLin == 0. There can only be a maximum of one 1 on new.var
per person (ID
).
df <- read.table(text=
"ID preLin postLin
800057 -8.55 0
800057 -6.34 0
800057 -5.34 0
800057 -4.34 0
800057 0 0.33
800119 -0.88 0
800119 0 0
800119 0 1
834011 -4.1 0
834011 -3.1 0
834341 0 1.34
834341 0 2.34
834341 0 3.34
834341 0 5.34
834341 0 6.66
800125 0 0
800125 0 2.14
897177 -0.33 0
897177 0 0.67
897177 0 1.67", header=TRUE)
First, I tried this code:
df$new.var1 <- ifelse(df$preLin == 0 & (df$postLin >= 0 & df$postLin <= 1), 1, 0)
However, for ID 800119 there will then be two 1s.
My expected outcome would look like this:
df_new <- read.table(text=
"ID preLin postLin new.var
800057 -8.55 0 0
800057 -6.34 0 0
800057 -5.34 0 0
800057 -4.34 0 0
800057 0 0.33 1
800119 -0.88 0 0
800119 0 0 1
800119 0 1 0
834011 -4.1 0 0
834011 -3.1 0 0
834341 0 1.34 0
834341 0 2.34 0
834341 0 3.34 0
834341 0 5.34 0
834341 0 6.66 0
800125 0 0 1
800125 0 2.14 0
897177 -0.33 0 0
897177 0 0.67 1
897177 0 1.67 0", header=TRUE)
Does anyone know a solution for my issue? Many thanks in advance!
Upvotes: 0
Views: 98
Reputation: 4514
Here is an approach which uses tidyverse
. It groups your IDs and uses the filter you want. A little helper columns is build which finds the first occurence in postLin, it is removed later.
mutatewith
ifelse applies your rules. Then the result is joined with the original DF and NA
s are turned into zeros.
library(tidyverse)
df %>%
group_by(ID) %>%
dplyr::filter(postLin > 0 | (postLin == 0 & preLin == 0)) %>%
dplyr::mutate(
first = dplyr::first(postLin)
) %>%
mutate(new.var = ifelse((postLin == first & postLin < 1), 1, 0)) %>%
select(-c(first)) %>%
right_join(df, by = c("ID", "preLin", "postLin")) %>%
mutate(new.var = ifelse(is.na(new.var), 0, new.var)) %>%
arrange(ID, preLin, postLin)
#> # A tibble: 20 x 4
#> # Groups: ID [6]
#> ID preLin postLin new.var
#> <int> <dbl> <dbl> <dbl>
#> 1 800057 -8.55 0 0
#> 2 800057 -6.34 0 0
#> 3 800057 -5.34 0 0
#> 4 800057 -4.34 0 0
#> 5 800057 0 0.33 1
#> 6 800119 -0.88 0 0
#> 7 800119 0 0 1
#> 8 800119 0 1 0
#> 9 800125 0 0 1
#> 10 800125 0 2.14 0
#> 11 834011 -4.1 0 0
#> 12 834011 -3.1 0 0
#> 13 834341 0 1.34 0
#> 14 834341 0 2.34 0
#> 15 834341 0 3.34 0
#> 16 834341 0 5.34 0
#> 17 834341 0 6.66 0
#> 18 897177 -0.33 0 0
#> 19 897177 0 0.67 1
#> 20 897177 0 1.67 0
Upvotes: 0
Reputation: 932
Your expected outcome comes from declaring the condition in ifelse
as follows:
df$new.var1 <- ifelse((df$postLin > 0 & df$postLin < 1) | (df$preLin == 0 & df$postLin ==0), 1, 0)
value = 1 if df$postLin gets values between 0 and 1 (but not 0 and 1) or if both df$preLin and df$postLin are 0. Otherwise value = 0
Result:
ID preLin postLin new.var1
1 800057 -8.55 0.00 0
2 800057 -6.34 0.00 0
3 800057 -5.34 0.00 0
4 800057 -4.34 0.00 0
5 800057 0.00 0.33 1
6 800119 -0.88 0.00 0
7 800119 0.00 0.00 1
8 800119 0.00 1.00 0
9 834011 -4.10 0.00 0
10 834011 -3.10 0.00 0
11 834341 0.00 1.34 0
12 834341 0.00 2.34 0
13 834341 0.00 3.34 0
14 834341 0.00 5.34 0
15 834341 0.00 6.66 0
16 800125 0.00 0.00 1
17 800125 0.00 2.14 0
18 897177 -0.33 0.00 0
19 897177 0.00 0.67 1
20 897177 0.00 1.67 0
Remark: keeping the condition as <=1 will result in an additional one in the third 800119, as @shirewoman2 said in her comment
Upvotes: 0