Reputation: 21
Apologies if this is a stupid question, I feel as if I could get an answer quickly if I just knew how to phrase it correctly!
In short: I have a large number of samples that come from a number of different sources. The specific source of the sample is not important, but knowing which samples come from the same source is.
So, what I have now is:
Sample source
S1 A
S2 B
S3 B
S4 A
S5 A
and what I need is..
S1 S2 S3 S4 S5
S1 1 0 0 1 1
S2 0 1 1 0 0
S3 0 1 1 0 0
S4 1 0 0 1 1
S5 1 0 0 1 1
Any help would be very appreciated...
Upvotes: 2
Views: 80
Reputation: 101064
You can try tcrossprod
+ xtabs
(or table
) like below
> tcrossprod(xtabs(~., df))
Sample
Sample S1 S2 S3 S4 S5
S1 1 0 0 1 1
S2 0 1 1 0 0
S3 0 1 1 0 0
S4 1 0 0 1 1
S5 1 0 0 1 1
or (thank @user12728748 for comments)
> tcrossprod(table(df))
Sample
Sample S1 S2 S3 S4 S5
S1 1 0 0 1 1
S2 0 1 1 0 0
S3 0 1 1 0 0
S4 1 0 0 1 1
S5 1 0 0 1 1
Data
df <- data.frame(Sample = c("S1", "S2", "S3", "S4", "S5"), source = c("A", "B", "B", "A", "A"))
Upvotes: 2
Reputation: 1234
Using sapply()
to loop over the vector of sample names:
samples = df$Sample
tab <- sapply(1:length(samples), function(x) df$source == df$source[df$Sample == samples[x]])*1
dimnames(tab) = list(samples,samples)
> tab
S1 S2 S3 S4 S5
S1 1 0 0 1 1
S2 0 1 1 0 0
S3 0 1 1 0 0
S4 1 0 0 1 1
S5 1 0 0 1 1
But there possibly exists a short one-line solution of which I'm not aware!
Data:
df <- data.frame(Sample = c("S1","S2","S3","S4","S5"), source = c("A","B","B","A","C"), stringsAsFactors = F)
Upvotes: 0