Reputation: 8136
I have some data in this form:
> agreers <- read.csv('agreers.csv')
> attach(agreers)
> head(agreers)
wain1 wain2 count
1 Founder36 Mnist10_269 673
2 Founder3 Mnist10_19 665
3 Mnist10_140 Mnist10_257 663
4 Founder1 Founder15 659
5 Founder21 Founder25 654
6 Founder15 Founder32 654
I created the data such that wain1 <= wain2
, so each pair appears in the table only once. So this would be an undirected graph.
I want to create a connection matrix, like so:
Mnist10_269 Mnist10_19 Mnist10_257 . . .
Founder36 673 ? ?
Founder3 ? 665 ?
Mnist10_140 ? ? 663
. . .
where the ?'s will be zero if there isn't any data in agreers
. So here's what I've tried:
> mat = matrix(0, nrow = length(unique(wain1)), ncol = length(unique(wain2)))
> rownames(mat) = unique(wain1)
> colnames(mat) = unique(wain2)
> for(i in as.integer(rownames(agreers))) mat[wain1[i], wain2[i]] = count[i]
It does something, i.e., mat
gets updated with numbers, but the numbers aren't in the right place! For example, I would expect this to return 673
.
> mat["Founder36","Mnist10_269"]
[1] 0
EDIT: Here's a bit more of the data file, to show the "duplicated levels in factors" problem. Note that Mnist10_140 appears twice in the first column, but with different values in the second column.
wain1,wain2,count
Founder36,Mnist10_269,673
Founder3,Mnist10_19,665
Mnist10_140,Mnist10_257,663
Founder1,Founder15,659
Founder21,Founder25,654
Founder15,Founder32,654
Mnist10_140,Mnist10_84,643
When processing just that subset of the data, I get warnings:
> agreers <- read.csv('temp.csv')
> connections <- xtabs(count ~ factor(wain1, levels = wain1) + factor(wain2, levels = wain2), agreers)
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
duplicated levels in factors are deprecated
Upvotes: 2
Views: 444
Reputation: 887138
Here is a variation of @cdeterman's approach (df
from the same post)
do.call(table, lapply(df[1:2], function(x)
factor(x, levels=unique(x))))*df[,3]
# wain2
# wain1 Mnist10_269 Mnist10_19 Mnist10_257 Founder15 Founder25 Founder32
# Founder36 673 0 0 0 0 0
# Founder3 0 665 0 0 0 0
# Mnist10_140 0 0 663 0 0 0
# Founder1 0 0 0 659 0 0
# Founder21 0 0 0 0 654 0
# Founder15 0 0 0 0 0 654
Upvotes: 1
Reputation: 19960
If you like base R you can use table
df <- read.table(header=TRUE, text=' wain1 wain2 count
Founder36 Mnist10_269 673
Founder3 Mnist10_19 665
Mnist10_140 Mnist10_257 663
Founder1 Founder15 659
Founder21 Founder25 654
Founder15 Founder32 654')
tab <- with(df,table(factor(wain1, levels=unique(wain1)),
factor(wain2, levels=unique(wain2))))
tab[which(tab == 1)] = df$count
tab
Mnist10_269 Mnist10_19 Mnist10_257 Founder15 Founder25 Founder32
Founder36 673 0 0 0 0 0
Founder3 0 665 0 0 0 0
Mnist10_140 0 0 663 0 0 0
Founder1 0 0 0 659 0 0
Founder21 0 0 0 0 654 0
Founder15 0 0 0 0 0 654
EDIT
As @DavidArenburg suggests, you can also use xtabs
xtabs(count ~ factor(wain1, levels = unique(wain1)) + factor(wain2, levels = unique(wain2)), df)
Upvotes: 4
Reputation: 54
Have a look at the package reshape2
library(reshape2)
agreers <- read.table(header = TRUE, stringsAsFactors = FALSE, sep = ',', text = "wain1,wain2,count\nFounder36,Mnist10_269,673\nFounder3,Mnist10_19,665\nMnist10_140,Mnist10_257,663\nFounder1,Founder15,659\nFounder21,Founder25,654\nFounder15,Founder32,654\n")
conMat <- dcast(agreers, wain1 ~ wain2, fill = 0)
rownames(conMat) <- conMat$wain1
conMat$wain1 <- NULL
conMat["Founder36","Mnist10_269"]
That should solve the problem.
EDIT This does not result in sorted data. Have a look at @cdeterman solution instead
Upvotes: 1