Reputation: 322
I have a data frame with 2 columns, "time and "a".
df <- data.frame(time = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a = c(3, 8, 2, 2, 2, 2, 2, 4, 5))
How is it possible to compare if the values changed over time? I need a new column "comp" in the data frame that shows if the third value in column "c" is the still the same as the last two values and the two values before in the same column. So the result could look like this:
df <- data.frame(time = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a = c(3, 8, 2, 2, 2, 2, 2, 4, 5), comp = c(F, F, F, F, T, F, F, F, F)
In the end I need to compare a column with about 3 mio. observations.
Upvotes: 1
Views: 1082
Reputation: 872
If I understand right, you're looking for values that are the same as their 2 adjacent values on either side, and in this case you're happy to ignore the 'missing' adjacent values for the 2 first & 2 last values.
Using base R:
sameasadj=function(v,n=2,include_ends=T) {
if(include_ends){vv=c(rep(head(v,1),n),v,rep(tail(v,1),n))}
else {vv=c(rep(NA,n),v,rep(NA,n))}
sapply(seq_along(v),function(i) diff(range(vv[i:(i+2*n)]))==0)
}
df$comp = sameasadj(df$a)
df$comp
Output:
[1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
Explanation:
sameasadj=function(v,n=2,include_ends=T)
= define function sameasadj to test whether each value is the same as its adjacent neighbours on each side. We can give the option to choose the number n of adjacent neighbours (in your case 2), and whether-or-not to include the ends (or to return 'NA' for these, since they lack enough neighbours on one side).
if(include_ends){vv=c(rep(head(v,1),n),v,rep(tail(v,1),n))}
= if we want to include the ends, then we just add the 'missing' neighbours so that they match
else {vv=c(rep(NA,n),v,rep(NA,n))}
= otherwise we add 'NA' values
sapply(seq_along(v),function(i)
= go along each position i in the vector...
diff(range(vv[i:(i+2*n)]))==0)
= ...and check whether the elements from i to i+2*n are all the same (diff(range(x))==0
will return TRUE
if all elements of x are the same)
Putting it all into a function makes it easy to change your mind later about the number of adjacent neighbours, or what to do with the ends...
Upvotes: 1
Reputation: 388982
A similar solution to @Bas using data.table
library(data.table)
setDT(df)[, comp := a == shift(a) & a == shift(a, 2) &
a == shift(a, type = 'lead') & a == shift(a, 2, type = 'lead')]
# time a comp
#1: 1 3 FALSE
#2: 2 8 FALSE
#3: 3 2 FALSE
#4: 4 2 FALSE
#5: 5 2 TRUE
#6: 6 2 FALSE
#7: 7 2 FALSE
#8: 8 4 FALSE
#9: 9 5 FALSE
Upvotes: 3
Reputation: 4658
Using the tidyverse:
library(tidyverse)
df %>%
arrange(time) %>%
mutate(comp = a == lag(a) & a == lag(a, 2) & a == lead(a) & a == lead(a, 2))
# time a comp
# 1 1 3 FALSE
# 2 2 8 FALSE
# 3 3 2 FALSE
# 4 4 2 FALSE
# 5 5 2 TRUE
# 6 6 2 FALSE
# 7 7 2 FALSE
# 8 8 4 FALSE
# 9 9 5 FALSE
Upvotes: 3