PSR
PSR

Reputation: 198

Converting cross-sectional data into an adjacency matrix in R

I am trying to convert cross-sectional data into an adjacency matrix, as I want to analyze how often certain variables are present together with social network analysis. In case empirical examples would help with the logic, it's basically analogous to presenting 4 people with a choice of three objects; they can choose from 0 to 3 of the objects. I'd like to analyze how commonly different objects were chosen together and visualize this as a network of preferences.

The data is set up as cross-sectional data, below:

ID1 <- c(1,0,0)
ID2 <- c(1,0,1)
ID3 <- c(1,1,1)
ID4 <- c(0,0,0)
IDs <- c("1","2","3","4")
df <- data.frame(rbind(ID1, ID2, ID3, ID4))
df <- cbind(IDs, df)
colnames(df) <- c("ID", "Var1", "Var2", "Var3")

I'd like to create a weighted adjacency matrix for Var1, Var2 and Var3, with each cell containing the total number of times the two variables occur together among the observations.

So the basic procedure I was thinking about is to create a separate matrix for each row (each ID number) with a 1 or 0 for each cell indicating whether or not both variables are present for the ID. And then add these matrices together, so the final matrix gives the total number of joint appearances.

I've been looking around and haven't quite gotten it right. I thought of using outer, but it'd need to work for each column in sequence. This answer was pretty close, but I wasn't exactly sure how they were adding together the values. I ended up with a list of matrices, but the values didn't correspond to the initial data- Convert categorical data in data frame to weighted adjacency matrix. And this answer was also close, although it seemed to have a different type of data. It gave me an adjacency matrix based on the IDs- http://r.789695.n4.nabble.com/Conversion-to-Adjacency-Matrix-td794102.html

Here is very messy code to manually create a matrix for one observation, just so you get a sense for what I'm going for (using a vector representing just the first ID observation)

ID1 <- c(1,0,0)

var1 <- ID1[[1]]
var2 <- ID1[[2]]
var3 <- ID1[[3]]
onetwo <- var1 * var2
onethree <- var1 * var3
twothree <- var2 * var3
oneone <- var1 * var1
twotwo <- var2 * var2
threethree <- var3 * var3
rows1 <- rbind(oneone, onetwo, onethree)
rows2 <- rbind(onetwo, twotwo, twothree)
rows3 <- rbind(onethree, twothree, threethree)
df2 <- cbind(rows1, rows2, rows3)

This obviously is not ideal, my actual dataset has 198 observations and 33 variables, so even with looping or the use of apply functions it would be very inefficient.

I can't tell if I'm making this more difficult than it needs to be, or if I'm trying to force my data to do something it wasn't meant to do. But if anyone has run into this sort of task before, please let me know. Is there a way to create the desired adjacency matrix directly? Should I transfer this into an edge list first, and is there a good way to do that? Is there code that would make the first step(creating a matrix for each row of the dataframe) more efficient?

Thanks for your help,

Upvotes: 2

Views: 779

Answers (1)

user974465
user974465

Reputation:

I'm not sure if I understand the question, but is this what you want?

nc=33
nr=198
m3<-matrix(sample(0:1,nc*nr,replace=TRUE),nrow=nr)
df3<-data.frame(m3)
m3b <-matrix(0,nrow=nc,ncol=nc)
for(i in seq(1,nc)) {
  for (j in seq(1,nc)) {
    t3<-table(df3[,i],df3[,j])
    m3b[i,j] = t3[2,2] # t3[2,2] contains the count of df3[,i] = df3[,j] = 1
    # or
    # t3 = sum(df3[,i]==df3[,j] & df3[,i] == 1)
    # m3b[i,j] = t3
  }
}

or, if you want the sum of the product, which gives the same result if everything is 1 or 0

m3c <-matrix(0,nrow=nc,ncol=nc)
for(i in seq(1,nc)) {
  for (j in seq(1,nc)) {
    sv=0
    for (k in seq(1,nr)) {
      vi = df3[k,i]
      vj = df3[k,j]
      sv=sv+vi*vj      
    }
    m3c[i,j] = sv
  }
}

Upvotes: 1

Related Questions