P. Denelle
P. Denelle

Reputation: 830

Equal pairs of rows in a data frame in R

I would like to construct a matrix where each cell tells whether each pair of rows from a data.frame is equal or not.

For example, with this data.frame:

set.seed(1)
ex <- data.frame(id = paste0("id", c(1:5)),
                 group = sample(c("a", "b", "c"), 5, replace = TRUE))
> ex
   id group
1 id1     a
2 id2     c
3 id3     a
4 id4     b
5 id5     a

I would like to obtain the following matrix:

      id1   id2   id3   id4   id5
id1  TRUE FALSE  TRUE FALSE  TRUE
id2 FALSE  TRUE FALSE FALSE FALSE
id3  TRUE FALSE  TRUE FALSE  TRUE
id4 FALSE FALSE FALSE  TRUE FALSE
id5  TRUE FALSE  TRUE FALSE  TRUE

Upvotes: 0

Views: 41

Answers (2)

Georgery
Georgery

Reputation: 8117

And here is a dplyr solution

# load the tidyverse package
library(tidyverse)

# this is your dataframe
set.seed(1)
ex <- data.frame(id = paste0("id", c(1:5)),
                 group = sample(c("a", "b", "c"), 5, replace = TRUE))

# now we create another dataframe that contains all ID combinations
df <- expand.grid(laterColumn = ex$id
            , laterRow = ex$id)

# now we take this dataframe
df %>% # then
    left_join(ex, by = c("laterColumn" = "id")) %>% # left join the groups on one ID
    left_join(ex, by = c("laterRow" = "id")) %>% # and then left join again on the other ID
    mutate(sameGroup = group.x == group.y) %>% # now we compare whether the groups are the same
    select(-group.x, -group.y) %>% # and remove the unnecessary group columns
    spread(key = laterColumn, value = sameGroup) # and finally bring it from a long into a wide format

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389047

We can use outer to compare every value of ex$group with itself.

outer(ex$group, ex$group, `==`)

#      [,1]  [,2]  [,3]  [,4]  [,5]
#[1,]  TRUE FALSE  TRUE FALSE  TRUE
#[2,] FALSE  TRUE FALSE FALSE FALSE
#[3,]  TRUE FALSE  TRUE FALSE  TRUE
#[4,] FALSE FALSE FALSE  TRUE FALSE
#[5,]  TRUE FALSE  TRUE FALSE  TRUE

If we need to add rownames and column names, we can do

matrix(outer(ex$group, ex$group, `==`), nrow = nrow(ex), 
       dimnames = list(ex$id, ex$id))

#      id1   id2   id3   id4   id5
#id1  TRUE FALSE  TRUE FALSE  TRUE
#id2 FALSE  TRUE FALSE FALSE FALSE
#id3  TRUE FALSE  TRUE FALSE  TRUE
#id4 FALSE FALSE FALSE  TRUE FALSE
#id5  TRUE FALSE  TRUE FALSE  TRUE

Upvotes: 1

Related Questions