ANmike
ANmike

Reputation: 267

Row comparison in R

I have the below data frame,

    R_Number    A  
    1           0  
    2           15  
    3           10  
    4           11  
    5           12  
    6           18  
    7           19  
    8           15  
    9           17  
    10          11  

Now I need to create another column B where the comparison of the values in A will be computed. The condition is that the comparion is not between two consecutive row, i.e Row number 1 is compared with Row number 4, like wise Row number 2 is compared with Row number 5 and this iteration continues till the end of the data . Condition for comparision result is:

     if (A[1]>=15 && A[4] <= 12) {
     B == 1  
     }
        else if (A[1]<=0 && A[4]>= 10) {
     B== 2 
     }
     else {
      B== 0 
     }

When it comes to Row number 8 and Row number 9these rows will not have next 4th row to compare with hence the value should be 0

Also, the comparision result of Row 1 and 4 is printed in Row number 1 similarly comparision result of Row 2 and 5 is printed in Row number 2

So the resulting dataframe should be as shown below

    R_Number    A       B  
    1           0       2
    2           15      1
    3           10      0 
    4           11      0
    5           12      0
    6           18      0
    7           19      1
    8           15      0
    9           17      0
    10          11      0

Upvotes: 4

Views: 3704

Answers (2)

Lorenzo Benassi
Lorenzo Benassi

Reputation: 621

According to @nicola comment, I tried to solve your problem as well. I recreated your initial data frame:

df <- data.frame(R_Number = c(1:10), A = c(0,15,10,11,12,18,19,15,17,11), B = 0)

So I used an if statement inside a cycle for:

for (i in 1:(length(df$A)-3)) {
if (df$A[i] >= 15 && df$A[i+3] <= 12) {
  df$B[i] <- 1
  } else if ((df$A[i] <= 0) && (df$A[i+3] >= 10)) {
  df$B[i] <- 2
  }
else {
  df$B[i] <- 0
  }
}

With last edit I solved the problem that came up when the length of data frame changed. Now you have a generic solution!

Upvotes: 2

Niek
Niek

Reputation: 1624

First lagging the variable and then computing your new variable should work. Something like this:

library(Hmisc)
df <- data.frame(R_Number = c(1:10), A = c(0,15,10,11,12,18,19,15,17,11))
A_Lag<-Lag(df$A,-3)
df$B <- rowSums(cbind(df$A>=15 & A_Lag <= 12,(df$A<=0 & A_Lag>= 10)*2),na.rm= T)
df$B

I tried to avoid if statements. The Lag function can be found in the Hmisc package.

> df$B
 [1] 2 1 0 0 0 0 1 0 0 0

Upvotes: 1

Related Questions