rnorouzian
rnorouzian

Reputation: 7517

Counting co-occurrence of levels of a variable within each level of another variable

My data below has two columns (studyID & post_id). The column post_id has 4 unique values (1 2 3 4).

I was wondering how to determine how many times each unique value of post_id (e.g., 1) co-occurs with another unique value of post_id (e.g., 2) in each level of studyID?

For this data, the expected output should be a matrix with the following 6 unique elements [row,col] on its lower-triangle and NA everywhere else.

Is this possible to achieve in R?

Across all levels of studyID, 1 with 2 co-occurs 31 times. [2,1]

Across all levels of studyID, 1 with 3 co-occurs 3 times. [3,1]

Across all levels of studyID, 1 with 4 co-occurs 1 time. [4,1]

Across all levels of studyID, 2 with 3 co-occurs 3 times. [3,2]

Across all levels of studyID, 2 with 4 co-occurs 1 time. [4,2]

Across all levels of studyID, 3 with 4 co-occurs 1 time. [4,3]

data <- read.csv("https://raw.githubusercontent.com/ilzl/i/master/pr.csv")[c(1,7)]

Upvotes: 1

Views: 165

Answers (2)

wutao
wutao

Reputation: 94

You can use group_by to count the number of different values of post_id in each level of studyID, and for co-occurrence, only count how many times the number producd by group_by of pairwise post_id is equal (exclude 0):

library(dplyr)
data <- read.csv("https://raw.githubusercontent.com/ilzl/i/master/pr.csv")[c(1,7)]
data %>% 
  group_by(studyID) %>% 
  summarise(`1`=sum(post_id==1),
            `2`=sum(post_id==2),
            `3`=sum(post_id==3),
            `4`=sum(post_id==4)) -> a

mat <- matrix(rnorm(16),nrow = 4,ncol = 4)
rownames(mat) <- colnames(a)[2:5]
colnames(mat) <- colnames(a)[2:5]

for (i in colnames(mat)){
  for (j in rownames(mat)){
    tmp <- a %>% select(i,j)
    tmp[tmp == 0] <- NA 
    tmp <- na.omit(tmp)
    mat[i,j] <- sum(tmp[,i] == tmp[,j])
  }
}
mat[!lower.tri(mat, diag = FALSE)] <- NA

Upvotes: 0

Onyambu
Onyambu

Reputation: 79338

`diag<-`(crossprod(table(data)>0), 0)

       post_id
post_id  1  2 3 4
      1  0 31 3 1
      2 31  0 3 1
      3  3  3 0 1
      4  1  1 1 0

Upvotes: 3

Related Questions