R - Create new variable based on certain conditions

Question

I have the following data frame df with longitudinal data in long format (see below). I would like to create a new variable called new.var based on some conditions. This new variable should be 1, if the first value of postLin for a person is > 0 but <= 1 or if the the value of postLin == 0 and the value of preLin == 0. There can only be a maximum of one 1 on new.var per person (ID).

df <- read.table(text=
"ID       preLin   postLin      
800057    -8.55    0               
800057    -6.34    0           
800057    -5.34    0           
800057    -4.34    0         
800057        0    0.33   
800119    -0.88    0  
800119        0    0           
800119        0    1       
834011     -4.1    0 
834011     -3.1    0   
834341        0    1.34 
834341        0    2.34   
834341        0    3.34   
834341        0    5.34    
834341        0    6.66  
800125        0    0
800125        0    2.14
897177    -0.33    0 
897177        0    0.67 
897177        0    1.67", header=TRUE)

First, I tried this code:

df$new.var1 <- ifelse(df$preLin == 0 & (df$postLin >= 0 & df$postLin <= 1), 1, 0)

However, for ID 800119 there will then be two 1s.

My expected outcome would look like this:

df_new <- read.table(text=
"ID       preLin   postLin    new.var  
800057    -8.55    0          0         
800057    -6.34    0          0  
800057    -5.34    0          0
800057    -4.34    0          0     
800057        0    0.33       1 
800119    -0.88    0          0 
800119        0    0          1  
800119        0    1          0
834011     -4.1    0          0 
834011     -3.1    0          0 
834341        0    1.34       0
834341        0    2.34       0
834341        0    3.34       0  
834341        0    5.34       0    
834341        0    6.66       0
800125        0    0          1
800125        0    2.14       0
897177    -0.33    0          0
897177        0    0.67       1
897177        0    1.67       0", header=TRUE)

Does anyone know a solution for my issue? Many thanks in advance!

davidnortes · Accepted Answer

Your expected outcome comes from declaring the condition in ifelse as follows:

df$new.var1 <- ifelse((df$postLin > 0 & df$postLin < 1) | (df$preLin == 0 & df$postLin ==0), 1, 0)

value = 1 if df$postLin gets values between 0 and 1 (but not 0 and 1) or if both df$preLin and df$postLin are 0. Otherwise value = 0

Result:

       ID preLin postLin new.var1
1  800057  -8.55    0.00        0
2  800057  -6.34    0.00        0
3  800057  -5.34    0.00        0
4  800057  -4.34    0.00        0
5  800057   0.00    0.33        1
6  800119  -0.88    0.00        0
7  800119   0.00    0.00        1
8  800119   0.00    1.00        0
9  834011  -4.10    0.00        0
10 834011  -3.10    0.00        0
11 834341   0.00    1.34        0
12 834341   0.00    2.34        0
13 834341   0.00    3.34        0
14 834341   0.00    5.34        0
15 834341   0.00    6.66        0
16 800125   0.00    0.00        1
17 800125   0.00    2.14        0
18 897177  -0.33    0.00        0
19 897177   0.00    0.67        1
20 897177   0.00    1.67        0

Remark: keeping the condition as <=1 will result in an additional one in the third 800119, as @shirewoman2 said in her comment

R - Create new variable based on certain conditions

Answers (2)

Related Questions