Reputation: 7517
My data
below has two columns (studyID
& post_id
). The column post_id
has 4 unique values (1 2 3 4
).
I was wondering how to determine how many times each unique value of post_id
(e.g., 1
) co-occurs with another unique value of post_id
(e.g., 2
) in each level of studyID
?
For this data, the expected output should be a matrix with the following 6 unique elements [row,col]
on its lower-triangle and NA
everywhere else.
Is this possible to achieve in R
?
Across all levels of studyID
, 1
with 2
co-occurs 31 times. [2,1]
Across all levels of studyID
, 1
with 3
co-occurs 3 times. [3,1]
Across all levels of studyID
, 1
with 4
co-occurs 1 time. [4,1]
Across all levels of studyID
, 2
with 3
co-occurs 3 times. [3,2]
Across all levels of studyID
, 2
with 4
co-occurs 1 time. [4,2]
Across all levels of studyID
, 3
with 4
co-occurs 1 time. [4,3]
data <- read.csv("https://raw.githubusercontent.com/ilzl/i/master/pr.csv")[c(1,7)]
Upvotes: 1
Views: 165
Reputation: 94
You can use group_by
to count the number of different values of post_id
in each level of studyID
, and for co-occurrence, only count how many times the number producd by group_by
of pairwise post_id
is equal (exclude 0):
library(dplyr)
data <- read.csv("https://raw.githubusercontent.com/ilzl/i/master/pr.csv")[c(1,7)]
data %>%
group_by(studyID) %>%
summarise(`1`=sum(post_id==1),
`2`=sum(post_id==2),
`3`=sum(post_id==3),
`4`=sum(post_id==4)) -> a
mat <- matrix(rnorm(16),nrow = 4,ncol = 4)
rownames(mat) <- colnames(a)[2:5]
colnames(mat) <- colnames(a)[2:5]
for (i in colnames(mat)){
for (j in rownames(mat)){
tmp <- a %>% select(i,j)
tmp[tmp == 0] <- NA
tmp <- na.omit(tmp)
mat[i,j] <- sum(tmp[,i] == tmp[,j])
}
}
mat[!lower.tri(mat, diag = FALSE)] <- NA
Upvotes: 0
Reputation: 79338
`diag<-`(crossprod(table(data)>0), 0)
post_id
post_id 1 2 3 4
1 0 31 3 1
2 31 0 3 1
3 3 3 0 1
4 1 1 1 0
Upvotes: 3