markrt
markrt

Reputation: 111

Crosstab of two identical variables in R - reflect in diagonal

I've got a dataset where I'm interested in the frequencies of different pairs emerging, but it doesn't matter which order the elements occur. For example:

library(janitor)

set.seed(24601)

options <- c("a", "b", "c", "d", "e", "f")

data.frame(x = sample(options, 20, replace = TRUE),
           y = sample(options, 20, replace = TRUE)) %>% 
  tabyl(x, y)

provides me with the output

 x a b c d e f
 a 1 0 1 0 1 0
 b 0 2 0 1 0 0
 c 2 0 1 0 0 0
 d 0 0 0 0 1 0
 e 1 1 2 0 0 3
 f 0 0 1 1 0 1

I'd ideally have the top right or bottom left of this table, where the combination of values a and c would be a total of 3. This is the sum of 1 (in the top right) and 2 (in the middle left). And so on for each other pair of values.

I'm sure there must be a simple way to do this, but I can't figure out what it is...

Edited to add (thanks @Akrun for the request): ideally I'd like the following output


x a b c d e f
a 1 0 3 0 2 0
b   2 0 1 1 0
c     1 0 2 1
d       0 1 1
e         0 3
f           1

Upvotes: 2

Views: 109

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 102609

Here is another option, using igraph

out[-1] <- get.adjacency(
  graph_from_data_frame(
    get.data.frame(
      graph_from_adjacency_matrix(
        as.matrix(out[-1]), "directed"
      )
    ), FALSE
  ),
  type = "upper",
  sparse = FALSE
)

which gives

> out
 x a b c d e f
 a 1 0 3 0 2 0
 b 0 2 0 1 1 0
 c 0 0 1 0 2 1
 d 0 0 0 0 1 1
 e 0 0 0 0 0 3
 f 0 0 0 0 0 1

Upvotes: 1

akrun
akrun

Reputation: 887811

We could + with the transposed output (except the first column), then replace the 'out' object upper triangle values (subset the elements based on the upper.tri - returns a logical vector) with that corresponding elements, and assign the lower triangle elements to NA

out2 <- out[-1] + t(out[-1])
out[-1][upper.tri(out[-1])] <- out2[upper.tri(out2)]
out[-1][lower.tri(out[-1])] <- NA

-output

out
# x  a  b  c  d  e f
# a  1  0  3  0  2 0
# b NA  2  0  1  1 0
# c NA NA  1  0  2 1
# d NA NA NA  0  1 1
# e NA NA NA NA  0 3
# f NA NA NA NA NA 1

data

set.seed(24601)
options <- c("a", "b", "c", "d", "e", "f")
out <- data.frame(x = sample(options, 20, replace = TRUE),
           y = sample(options, 20, replace = TRUE)) %>% 
  tabyl(x, y)

Upvotes: 1

Related Questions