How to create a new variable conditional on neighboring ID position in R?

Question

This is what my data looks like:

ID XYZ   N_ID1 N_ID2
1   10      3   4               
2   20      8   2           
3   50      6   5           
4   100     2   6           
5   70      7   10          
6   25      1   3           
7   30      2   4           
8   35      6   9
.   .       .   .
.   .       .   .
.   .       .   .

So, I have two variables here called 'N_ID1' and 'N_ID2' which are the 2 neighbors of the 'ID' variable.

I want to create a new variable based on: 'XYZ', 'N_ID1', 'N_ID2', such that the new variable is an average of 'XYZ' for the values at ID location of 'N_ID1' and 'N_ID2'.

So, if we look at the first row where ID=1, there, 'N_ID1' = 3, 'N_ID2'= 4. Now, my new variable should be an average of 'XYZ' value at ID=3 and 'XYZ' value at ID=4. Similarly for other rows as well..

This is what my final result should look like:

ID XYZ   N_ID1 N_ID2 New_Variable
1   10      3   4   (50+100)/2 = 75             
2   20      8   2   (35+20)/2 = 27.5            
3   50      6   5   (25+70)/2 = 47.5        
4   100     2   6       .   
5   70      7   10      .   
6   25      1   3       .   
7   30      2   4       .   
8   35      6   9       .
.   .       .   .       .
.   .       .   .       .
.   .       .   .       .

So, as you can see above, the first value in the 'New_Variable' = 75 which is the average of ID#3 and ID#4 for 'XYZ'

Can anyone please tell me how to do this in R?

thelatemail · Accepted Answer

match each N_IDx to ID, subset XYZ, add +, and divide.

Reduce(`+`,
  lapply(dat[c("N_ID1","N_ID2")], function(x) dat$XYZ[match(x,dat$ID)] )
) / 2
#[1] 75.0 27.5 47.5 22.5   NA 30.0 60.0   NA

Without the functional approach, it would just be:

with(dat, (XYZ[match(N_ID1, ID)] + XYZ[match(N_ID2, ID)]) / 2 )

But this will become painful if you have many variables to sum.

How to create a new variable conditional on neighboring ID position in R?

Answers (1)

Related Questions