Bolle
Bolle

Reputation: 322

compare multiple values with multple values in R Dataframe

I have a data frame with 2 columns, "time and "a".

df <- data.frame(time = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a = c(3, 8, 2, 2, 2, 2, 2, 4, 5))

How is it possible to compare if the values changed over time? I need a new column "comp" in the data frame that shows if the third value in column "c" is the still the same as the last two values and the two values before in the same column. So the result could look like this:

df <- data.frame(time = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a = c(3, 8, 2, 2, 2, 2, 2, 4, 5), comp = c(F, F, F, F, T, F, F, F, F)

In the end I need to compare a column with about 3 mio. observations.

Upvotes: 1

Views: 1082

Answers (3)

Dominic van Essen
Dominic van Essen

Reputation: 872

If I understand right, you're looking for values that are the same as their 2 adjacent values on either side, and in this case you're happy to ignore the 'missing' adjacent values for the 2 first & 2 last values.

Using base R:

sameasadj=function(v,n=2,include_ends=T) {
    if(include_ends){vv=c(rep(head(v,1),n),v,rep(tail(v,1),n))} 
    else {vv=c(rep(NA,n),v,rep(NA,n))}
    sapply(seq_along(v),function(i) diff(range(vv[i:(i+2*n)]))==0)
}

df$comp = sameasadj(df$a)
df$comp

Output:

[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

Explanation:

sameasadj=function(v,n=2,include_ends=T) = define function sameasadj to test whether each value is the same as its adjacent neighbours on each side. We can give the option to choose the number n of adjacent neighbours (in your case 2), and whether-or-not to include the ends (or to return 'NA' for these, since they lack enough neighbours on one side).

if(include_ends){vv=c(rep(head(v,1),n),v,rep(tail(v,1),n))} = if we want to include the ends, then we just add the 'missing' neighbours so that they match

else {vv=c(rep(NA,n),v,rep(NA,n))} = otherwise we add 'NA' values

sapply(seq_along(v),function(i) = go along each position i in the vector...

diff(range(vv[i:(i+2*n)]))==0) = ...and check whether the elements from i to i+2*n are all the same (diff(range(x))==0 will return TRUE if all elements of x are the same)

Putting it all into a function makes it easy to change your mind later about the number of adjacent neighbours, or what to do with the ends...

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

A similar solution to @Bas using data.table

library(data.table)
setDT(df)[, comp := a == shift(a) & a == shift(a, 2) & 
                  a == shift(a, type = 'lead') & a == shift(a, 2, type = 'lead')]

#   time a  comp
#1:    1 3 FALSE
#2:    2 8 FALSE
#3:    3 2 FALSE
#4:    4 2 FALSE
#5:    5 2  TRUE
#6:    6 2 FALSE
#7:    7 2 FALSE
#8:    8 4 FALSE
#9:    9 5 FALSE

Upvotes: 3

Bas
Bas

Reputation: 4658

Using the tidyverse:

library(tidyverse)

df %>% 
  arrange(time) %>% 
  mutate(comp = a == lag(a) & a == lag(a, 2) & a == lead(a) & a == lead(a, 2))

#   time a  comp
# 1    1 3 FALSE
# 2    2 8 FALSE
# 3    3 2 FALSE
# 4    4 2 FALSE
# 5    5 2  TRUE
# 6    6 2 FALSE
# 7    7 2 FALSE
# 8    8 4 FALSE
# 9    9 5 FALSE

Upvotes: 3

Related Questions