Martijn
Martijn

Reputation: 129

R - How can I check if a value in a row is different from the value in the previous row?

I would like to add a column to my table that compares a value with a previous value in an existing column 'farm' (to check if it is the same); and also controls if the value in the current row is "NULL". The objective is to get back in the new column 'switch' the value "new" when the value in the column 'farm' for that row is different from the value in the previous row for the column 'farm'. (exept when the value in farm is "NULL", then I would like to get back "")

See here below the desired output:

farm    switch
A   
A   
NULL    
B   new
B   
B   
A   new
A   
A   
B   new
B   
B   
NULL    
A   new
A   

I tried to solve this using the below code:

#To add a new column switch
MyData["switch"] <- NA

#To check if the value is different from the previous row; and if the value is different from NULL
MyData$switch <- ifelse((MyData$farm == lag(MyData$farm))||MyData$farm=="NULL","",MyData$farm)

But when I use this code, then my added column has only empty values? Can somebody please clarify what I am doing wrong and help me with a code that might work?

Upvotes: 1

Views: 4686

Answers (1)

akrun
akrun

Reputation: 887118

We create a logical index ('ind') by comparing the current row with the next row (we can do that by removing the 1st and last element of the 'farm' column for comparison), and also include the condition that the element is not "NULL". Based on the logical index, we can change the TRUE to 'New' and FALSE to '' with ifelse.

ind <- with(MyData, c(FALSE, farm[-1L]!= farm[-length(farm)]) & farm!='NULL')
MyData$switch <- ifelse(ind, 'New', '')

MyData
#   farm switch
#1     A       
#2     A       
#3  NULL       
#4     B    New
#5     B       
#6     B       
#7     A    New
#8     A       
#9     A       
#10    B    New
#11    B       
#12    B       
#13 NULL       
#14    A    New
#15    A       

To understand the concept of [-1L] and -length, suppose we have a vector

v1 <- c(2, 2, 3, 1, 5)
v1[-1] #removes the first observation
#[1] 2 3 1 5

v1[-length(v1)]# removes the last one
#[1] 2 2 3 1

When we compare these two, we are comparing the current row (v1[-length(v1)]) against the next row (v1[-1]). As the length is one less than the original length of 'v1', we append a 'TRUE' or 'FALSE' depending upon our logical condition

 c(FALSE, v1[-1]!= v1[-length(v1)])

In your case there is a second condition that asserts the value cannot be "NULL". So when combine both of this with &, only the TRUE values in both of them get the 'TRUE' and rest are 'FALSE'.

data

MyData <- structure(list(farm = c("A", "A", "NULL", "B", "B", "B", "A", 
"A", "A", "B", "B", "B", "NULL", "A", "A")), .Names = "farm",
class =  "data.frame", row.names = c(NA, -15L))

Upvotes: 6

Related Questions