Detecting identical observations across columns in R dataframe

Question

Within a dataframe in R, I'm trying to detect whether 5 consecutive observations are identical in a pseudorandom design.

In this design, two classes of objects are presented 5 times (A objects and B objects) and participants make a rating of them (A1 - B5). The ordering of each object is contained in columns as well (A1.order - B5.order). What I'd like to know is whether 5 consecutive observations (can be the first 5, middle 5, last 5, etc.) are identical. The typical functions for detecting whether columns are identical don't work here because I am concerned about the presentation order of the objects.

I'd like for a new column called "identical" to be added to the dataframe which will give me a TRUE or FALSE as to whether there are 5 identical consecutive observations.

I'd also like for those identical observations to be coded as "NA"

df <- structure(list(ID = c(101, 102, 103, 104, 105, 106, 107), A1 = c(1, 
4, 1, 3, 4, 5, 3), A2 = c(1, 3, 1, 1, 5, 3, 2), A3 = c(1, 1, 
1, 5, 4, 3, 2), A4 = c(1, 3, 2, 1, 3, 2, 1), A5 = c(3, 1, 2, 
3, 5, 4, 5), B1 = c(3, 2, 1, 5, 1, 3, 2), B2 = c(1, 2, 1, 4, 
2, 4, 5), B3 = c(1, 2, 3, 2, 5, 4, 3), B4 = c(2, 3, 1, 4, 5, 
2, 3), B5 = c(2, 2, 2, 2, 2, 2, 2), A1.order = c(1, 2, 1, 2, 
1, 2, 1), A2.order = c(3, 4, 3, 4, 3, 4, 3), A3.order = c(5, 
6, 5, 6, 5, 6, 5), A4.order = c(7, 8, 7, 8, 7, 8, 7), A5.order = c(10, 
9, 10, 9, 10, 9, 10), B1.order = c(2, 1, 2, 1, 2, 1, 2), B2.order = c(4, 
3, 4, 3, 4, 3, 4), B3.order = c(6, 5, 6, 5, 6, 5, 6), B4.order = c(8, 
7, 8, 7, 8, 7, 8), B5.order = c(9, 10, 9, 10, 9, 10, 9)), row.names = c(NA, 
7L), class = "data.frame")

df

#>    ID A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 A1.order A2.order A3.order A4.order
#> 1 101  1  2  3  4  3  3  1  1  2  2        1        3        5        7
#> 2 102  4  3  1  3  1  2  2  2  3  2        2        4        6        8
#> 3 103  2  4  3  2  2  4  3  3  1  2        1        3        5        7
#> 4 104  3  1  5  1  3  5  4  2  4  2        2        4        6        8
#> 5 105  4  5  4  3  5  1  2  5  5  2        1        3        5        7
#> 6 106  5  3  3  2  4  3  4  4  2  2        2        4        6        8
#> 7 107  3  2  2  1  5  2  5  3  3  2        1        3        5        7
#>   A5.order B1.order B2.order B3.order B4.order B5.order
#> 1       10        2        4        6        8        9
#> 2        9        1        3        5        7       10
#> 3       10        2        4        6        8        9
#> 4        9        1        3        5        7       10
#> 5       10        2        4        6        8        9
#> 6        9        1        3        5        7       10
#> 7       10        2        4        6        8        9

Here is the output I want:

> df2
   ID A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 A1.order A2.order A3.order A4.order A5.order B1.order B2.order
1 101  3 NA NA NA  3  3 NA NA  2  2        1        3        5        7       10        2        4
2 102  4  3  1  3  1  2  2  2  3  2        2        4        6        8        9        1        3
3 103  1  1  1  2  2  1  1  3  1  2        1        3        5        7       10        2        4
4 104  3  1  5  1  3  5  4  2  4  2        2        4        6        8        9        1        3
5 105  4  5  4  3  5  1  2  5  5  2        1        3        5        7       10        2        4
6 106  5  3  3  2  4  3  4  4  2  2        2        4        6        8        9        1        3
7 107  3  2  2  1  5  2  5  3  3  2        1        3        5        7       10        2        4
  B3.order B4.order B5.order identical
1        6        8        9      TRUE
2        5        7       10     FALSE
3        6        8        9     FALSE
4        5        7       10     FALSE
5        6        8        9     FALSE
6        5        7       10     FALSE
7        6        8        9     FALSE

David Ranzolin · Accepted Answer

The rle function calculates streaks across a vector. You can (1) create a detect_streaks function that returns TRUE/FALSE; and (2) loop through the rows, checking for consecutive streaks.

detect_streak <- function(x) {
  streaks <- rle(x)$lengths
  return(any(streaks >= 5))
}

df$identical <- vector(mode = "logical", length = nrow(df))
for (i in 1:nrow(df)) {
  df$identical[i] <- detect_streak(df[i,2:11])
}

In the data you've provided, there are no streaks between A1:B5.

Detecting identical observations across columns in R dataframe

Answers (2)

Related Questions