user2507608
user2507608

Reputation: 365

Cross tabulation of pair-wise differences

Supposing I have a dataframe with 2 variables corresponding to 2 indices calculated for different groups A, B and C for example. So the dataframe is essentially:

 >df
   Group      v.1      v.2
     A         2        3
     B         4        4
     C         7        9

I would like to calculate the pair-wise difference per each variable (v.1 & v.2) then plot the result in a cross-tabulation format, so the values below the diagonal gives the pair-wise differences in v.1 and the upper diagonal, the values for the pairwise differences in v.2. So the result would look like:

        A       B       C 
   A    0       1       6
   B    2       0       5
   C    5       3       0

Is there any package that would help me achieve this? Any suggestions would be welcomed.

Upvotes: 2

Views: 402

Answers (3)

Falk Mielke
Falk Mielke

Reputation: 1

The accepted answer above by @A5C1D2H2I1M1N2O1R2T1 is not entirely correct due to index order in the upper.tri assignment. You will only see this when using four elements.

df <- data.frame(list(
    "Group" = c("A", "B", "C", "D"),
    "v.1" =   c( 1,   1,   1,   3 ),
    "v.2" =   c( 1,   1,   1,   2 )
    ))

m <- matrix(0, nrow = nrow(df), ncol = nrow(df),
            dimnames=list(df$Group, df$Group))

m[lower.tri(m)] <- combn(df$v.1, 2, FUN=diff)
m[upper.tri(m)] <- combn(df$v.2, 2, FUN=diff)
m
#   A B C D
# A 0 0 0 0
# B 0 0 1 1
# C 0 0 0 1
# D 2 2 2 0

(see the m[2,3] element which should be at m[1,4] instead)

Solution: I would suggest to transpose to correct; maybe there is an easier way?

n <- matrix(0, nrow = nrow(df), ncol = nrow(df),
            dimnames=list(df$Group, df$Group))

n[lower.tri(n)] <- combn(df$v.2, 2, FUN=diff)
n <- t(n)
n[lower.tri(n)] <- combn(df$v.1, 2, FUN=diff)


The other answer by @Valentin_Stefan, using the outer product, seems correct for the special case of OP question. However, note that the mat_dif1[mat_dif1<0] <- 0 step will only be appropriate if your data is strictly increasing.

df <- data.frame(list(
    "Group" = c("A", "B", "C", "D"),
    "v.1" =   c( 1,   1,   1,   3 ),
    "v.2" =   c( 1,   1,   2,   1 )
    ))

# matrix of pairwise differences for v.1
mat_dif1 <- outer(X = df$v.1, Y = df$v.1, FUN = "-")
mat_dif1[mat_dif1<0] <- 0

# matrix of pairwise differences for v.2
mat_dif2 <- outer(X = df$v.2, Y = df$v.2, FUN = "-")
mat_dif2[mat_dif2>0] <- 0

mat_dif1 + abs(mat_dif2)
#       [,1] [,2] [,3] [,4]
# [1,]    0    0    1    0
# [2,]    0    0    1    0
# [3,]    0    0    0    0
# [4,]    2    2    3    0

The outer product is much FUN, and one could somehow use it in comparable cases, yet for the OP I guess one cannot get by without the upper.tri and lower.tri.

df <- data.frame(list(
    "Group" = c("A", "B", "C", "D"),
    "v.1" =   c( 1,   1,   1,   3 ),
    "v.2" =   c( 1,   1,   2,   1 )
    ))

mat_dif1 <- outer(X = df$v.1, Y = df$v.1, FUN = function(X, Y) abs(Y-X))
mat_dif2 <- outer(X = df$v.2, Y = df$v.2, FUN = function(X, Y) abs(Y-X))
o <- matrix(0, nrow = nrow(df), ncol = nrow(df),
            dimnames=list(df$Group, df$Group))
o[lower.tri(o)] <- mat_dif1[lower.tri(mat_dif1)]
o[upper.tri(o)] <- mat_dif2[upper.tri(mat_dif2)]
o

These strategies are, of course, much more efficient if you just cross-tabulate a single array.

Upvotes: 0

Valentin_Ștefan
Valentin_Ștefan

Reputation: 6446

Not that smooth as the solution of @A5C1D2H2I1M1N2O1R2T1, but one could also approach this with outer function from R base package:

df <- read.table(text =
                   "Group      v.1      v.2
                      A         2        3
                      B         4        4
                      C         7        9",
                 header = TRUE)

# matrix of pairwise differences for v.1
mat_dif1 <- outer(X = df$v.1, Y = df$v.1, FUN = "-")
mat_dif1[mat_dif1<0] <- 0

# matrix of pairwise differences for v.2
mat_dif2 <- outer(X = df$v.2, Y = df$v.2, FUN = "-")
mat_dif2[mat_dif2>0] <- 0

mat_dif1 + abs(mat_dif2)

##      [,1] [,2] [,3]
## [1,]    0    1    6
## [2,]    2    0    5
## [3,]    5    3    0

If you need the row and column names, then:

results <- mat_dif1 + abs(mat_dif2)
dimnames(results) <- list(df$Group, df$Group)
results
##   A B C
## A 0 1 6
## B 2 0 5
## C 5 3 0

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193627

You can probably use combn+diff along with upper.tri and lower.tri as follows:

m <- matrix(0, nrow = nrow(df), ncol = nrow(df), 
            dimnames=list(df$Group, df$Group))
m
#   A B C
# A 0 0 0
# B 0 0 0
# C 0 0 0

m[lower.tri(m)] <- combn(df$v.1, 2, FUN=diff)
m[upper.tri(m)] <- combn(df$v.2, 2, FUN=diff)
m
#   A B C
# A 0 1 6
# B 2 0 5
# C 5 3 0

Upvotes: 6

Related Questions