msm80
msm80

Reputation: 21

How to create a binary matrix of exact matches in R

Apologies if this is a stupid question, I feel as if I could get an answer quickly if I just knew how to phrase it correctly!

In short: I have a large number of samples that come from a number of different sources. The specific source of the sample is not important, but knowing which samples come from the same source is.

So, what I have now is:

  Sample source
  S1      A
  S2      B
  S3      B
  S4      A
  S5      A

and what I need is..

  S1 S2 S3 S4 S5
S1 1  0  0  1  1
S2 0  1  1  0  0
S3 0  1  1  0  0
S4 1  0  0  1  1
S5 1  0  0  1  1

Any help would be very appreciated...

Upvotes: 2

Views: 80

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

You can try tcrossprod + xtabs (or table) like below

> tcrossprod(xtabs(~., df))
      Sample
Sample S1 S2 S3 S4 S5
    S1  1  0  0  1  1
    S2  0  1  1  0  0
    S3  0  1  1  0  0
    S4  1  0  0  1  1
    S5  1  0  0  1  1

or (thank @user12728748 for comments)

> tcrossprod(table(df))
      Sample
Sample S1 S2 S3 S4 S5
    S1  1  0  0  1  1
    S2  0  1  1  0  0
    S3  0  1  1  0  0
    S4  1  0  0  1  1
    S5  1  0  0  1  1

Data

df <- data.frame(Sample = c("S1", "S2", "S3", "S4", "S5"), source = c("A", "B", "B", "A", "A"))

Upvotes: 2

VitaminB16
VitaminB16

Reputation: 1234

Using sapply() to loop over the vector of sample names:

samples = df$Sample
tab <- sapply(1:length(samples), function(x) df$source == df$source[df$Sample == samples[x]])*1
dimnames(tab) = list(samples,samples)

> tab
   S1 S2 S3 S4 S5
S1  1  0  0  1  1
S2  0  1  1  0  0
S3  0  1  1  0  0
S4  1  0  0  1  1
S5  1  0  0  1  1

But there possibly exists a short one-line solution of which I'm not aware!


Data:

df <- data.frame(Sample = c("S1","S2","S3","S4","S5"), source = c("A","B","B","A","C"), stringsAsFactors = F)

Upvotes: 0

Related Questions