mhwombat
mhwombat

Reputation: 8136

Creating a connection matrix from a data frame in R

I have some data in this form:

> agreers <- read.csv('agreers.csv')
> attach(agreers)
> head(agreers)
        wain1       wain2 count
1   Founder36 Mnist10_269   673
2    Founder3  Mnist10_19   665
3 Mnist10_140 Mnist10_257   663
4    Founder1   Founder15   659
5   Founder21   Founder25   654
6   Founder15   Founder32   654

I created the data such that wain1 <= wain2, so each pair appears in the table only once. So this would be an undirected graph.

I want to create a connection matrix, like so:

          Mnist10_269 Mnist10_19 Mnist10_257 . . .
Founder36    673           ?          ?
Founder3       ?         665          ?
Mnist10_140    ?           ?        663
  . . .

where the ?'s will be zero if there isn't any data in agreers. So here's what I've tried:

> mat = matrix(0, nrow = length(unique(wain1)), ncol = length(unique(wain2)))
> rownames(mat) = unique(wain1)
> colnames(mat) = unique(wain2)
> for(i in as.integer(rownames(agreers))) mat[wain1[i], wain2[i]] = count[i]

It does something, i.e., mat gets updated with numbers, but the numbers aren't in the right place! For example, I would expect this to return 673.

> mat["Founder36","Mnist10_269"]
[1] 0

EDIT: Here's a bit more of the data file, to show the "duplicated levels in factors" problem. Note that Mnist10_140 appears twice in the first column, but with different values in the second column.

wain1,wain2,count
Founder36,Mnist10_269,673
Founder3,Mnist10_19,665
Mnist10_140,Mnist10_257,663
Founder1,Founder15,659
Founder21,Founder25,654
Founder15,Founder32,654
Mnist10_140,Mnist10_84,643

When processing just that subset of the data, I get warnings:

> agreers <- read.csv('temp.csv')
> connections <- xtabs(count ~ factor(wain1, levels = wain1) + factor(wain2, levels = wain2), agreers)
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated

Upvotes: 2

Views: 444

Answers (3)

akrun
akrun

Reputation: 887138

Here is a variation of @cdeterman's approach (df from the same post)

 do.call(table, lapply(df[1:2], function(x) 
            factor(x, levels=unique(x))))*df[,3]
 #              wain2
 # wain1         Mnist10_269 Mnist10_19 Mnist10_257 Founder15 Founder25 Founder32
 # Founder36           673          0           0         0         0         0
 # Founder3              0        665           0         0         0         0
 # Mnist10_140           0          0         663         0         0         0
 # Founder1              0          0           0       659         0         0
 # Founder21             0          0           0         0       654         0
 # Founder15             0          0           0         0         0       654

Upvotes: 1

cdeterman
cdeterman

Reputation: 19960

If you like base R you can use table

df <- read.table(header=TRUE, text='   wain1       wain2 count
   Founder36 Mnist10_269   673
    Founder3  Mnist10_19   665
 Mnist10_140 Mnist10_257   663
    Founder1   Founder15   659
   Founder21   Founder25   654
   Founder15   Founder32   654')

tab <- with(df,table(factor(wain1, levels=unique(wain1)),
                   factor(wain2, levels=unique(wain2))))
tab[which(tab == 1)] = df$count
tab

              Mnist10_269 Mnist10_19 Mnist10_257 Founder15 Founder25 Founder32
  Founder36           673          0           0         0         0         0
  Founder3              0        665           0         0         0         0
  Mnist10_140           0          0         663         0         0         0
  Founder1              0          0           0       659         0         0
  Founder21             0          0           0         0       654         0
  Founder15             0          0           0         0         0       654

EDIT

As @DavidArenburg suggests, you can also use xtabs

xtabs(count ~ factor(wain1, levels = unique(wain1)) + factor(wain2, levels = unique(wain2)), df)

Upvotes: 4

Lars
Lars

Reputation: 54

Have a look at the package reshape2

library(reshape2)
agreers <- read.table(header = TRUE, stringsAsFactors = FALSE, sep = ',', text = "wain1,wain2,count\nFounder36,Mnist10_269,673\nFounder3,Mnist10_19,665\nMnist10_140,Mnist10_257,663\nFounder1,Founder15,659\nFounder21,Founder25,654\nFounder15,Founder32,654\n")
conMat <- dcast(agreers, wain1 ~ wain2, fill = 0)
rownames(conMat) <- conMat$wain1
conMat$wain1 <- NULL

conMat["Founder36","Mnist10_269"]

That should solve the problem.

EDIT This does not result in sorted data. Have a look at @cdeterman solution instead

Upvotes: 1

Related Questions