Josefien
Josefien

Reputation: 47

Count how often two factors have the same output value

I want to calculate the number of times two individuals share the same group number. I'm working with quite a large dataset (169 individuals and over a 1000 observations (rows) of them) and I'm looking for an efficient way to count the occurrence of them being in the same group. My (simplified) data looks like this:

ID Group number Date Time
Aa 1 15-06-22 15:05:22
Bd 1 15-06-22 15:05:27
Cr 2 15-06-22 15:07:12
Bd 1 15-06-22 17:33:15
Aa 2 15-06-22 17:36:54
Cr 2 15-06-22 17:37:01
...

I would like my output data to look like this:

Aa-Bd Aa-Cr Bd-Cr ...
1 1 0

Or:

Occurrence Dyad
1 Aa-Bd; Aa-Cr
0 Bd-Cr

Or even a matrix might work. I've been trying to replicate the solution posed for this problem: Count occurrences of a variable having two given values corresponding to one value of another variable but for some reason my matrix remains empty, even though I know that certain individuals have been in groups with others.

Any help and suggestions would be extremely appreciated! I feel like the solution shouldn't be too complicated but for some reason I can't seem to figure it out.

Thanks in advance!

Edit: some example data from dput():

dput(c[1:5,])
structure(list(Date = structure(c(19129, 19129, 19129, 19129, 
19129), class = "Date"), Time = c("11:05:58", "11:06:06", "11:06:16", 
"11:06:33", "11:06:59"), Data = structure(c(1L, 1L, 1L, 1L, 1L
), .Label = "Crossing", class = "factor"), Group = structure(c(5L, 
5L, 5L, 5L, 5L), .Label = c("Ankhase", "Baie Dankie", "Kubu", 
"Lemon Tree", "Noha"), class = "factor"), IDIndividual1 =    structure(c(158L, 
158L, 34L, 153L, 14L), .Label = c("Aaa", "Aal", "Aan", "Aapi", 
"Aar", "Aara", "Aare", "Aat", "Amst", "App", "Asis", "Awa", "Beir", 
"Bela", "Bet", "Buk", "Daa", "Dais", "Dazz", "Deli", "Dewe", 
"Dian", "Digb", "Dix", "Dok", "Dore", "Eina", "Eis", "Enge", 
"Fle", "Flu", "Fur", "Gale", "Gaya", "Gese", "Gha", "Ghid", "Gib", 
"Gil", "Ginq", "Gobe", "Godu", "Goe", "Gom", "Gran", "Gree", 
"Gri", "Gris", "Griv", "Guat", "Gub", "Guba", "Gubh", "Guz", 
"Haai", "Hee", "Heer", "Heli", "Hond", "Kom", "Lail", "Lewe", 
"Lif", "Lill", "Lizz", "Mara", "Mas", "Miel", "Misk", "Moes", 
"Mom", "Mui", "Naal", "Nak", "Ncok", "Nda", "Ndaw", "Ndl", "Ndon", 
"Ndum", "Nge", "Nko", "Nkos", "Non", "Nooi", "Numb", "Nurk", 
"Nuu", "Obse", "Oerw", "Oke", "Ome", "Oort", "Ouli", "Oup", "Palm", 
"Pann", "Papp", "Pie", "Piep", "Pix", "Pom", "Popp", "Prai", 
"Prat", "Pret", "Prim", "Puol", "Raba", "Rafa", "Ram", "Rat", 
"Rede", "Ree", "Reen", "Regi", "Ren", "Reno", "Rid", "Rim", "Rioj", 
"Riss", "Riva", "Rivi", "Roc", "Sari", "Sey", "Sho", "Sig", "Sirk", 
"Sitr", "Skem", "Sla", "Spe", "Summary", "Syl", "Tam", "Ted", 
"Tev", "Udup", "Uls", "Umb", "Unk", "UnkAM", "UnkBB", "UnkJ", 
"UnkJF", "UnkJM", "Upps", "Utic", "Utr", "Vla", "Vul", "Xala", 
"Xar", "Xeni", "Xia", "Xian", "Xih", "Xin", "Xinp", "Xop", "Yam", 
"Yamu", "Yara", "Yaz", "Yelo", "Yodo", "Yuko"), class = "factor"), 
Behaviour = structure(c(2L, 3L, 1L, 1L, 1L), .Label = c("Crossing", 
"First Approacher", "First Crosser", "Last Crosser", "Summary"
), class = "factor"), CrossingType = c("Road - Ground Level", 
"Road - Ground Level", "Road - Ground Level", "Road - Ground Level", 
"Road - Ground Level"), GPSS = c(-27.9999, -27.9999, -27.9999, 
-27.9999, -27.9999), GPSE = c(31.20376, 31.20376, 31.20376, 
31.20376, 31.20376), Context = structure(c(1L, 1L, 1L, 1L, 
1L), .Label = c("Crossing", "Feeding", "Moving", "Unknown"
), class = "factor"), Observers = structure(c(12L, 12L, 12L, 
12L, 12L), .Label = c("Christelle", "Christelle; Giulia", 
"Christelle; Maria", "Elif; Giulia", "Josefien; Zach; Flavia; Maria", 
"Mathieu", "Mathieu; Giulia", "Mike; Mila", "Mila", "Mila; Christelle; Giulia", 
"Mila; Elif", "Mila; Giulia", "Nokubonga; Mila", "Nokubonga; Tam; Flavia", 
"Nokubonga; Tam; Flavia; Maria", "Nokubonga; Zach; Flavia; Maria", 
"Tam; Flavia", "Tam; Zach; Flavia; Maria", "Zach", "Zach; Elif; Giulia", 
"Zach; Flavia; Maria", "Zach; Giulia"), class = "factor"), 
DeviceId = structure(c(10L, 10L, 10L, 10L, 10L), .Label = c("{129F4050-2294-0D43-890F-3B2DEF58FC1A}", 
"{1A678F44-DB8C-1245-8DD7-9C2F92F086CA}", "{1B249FD2-AA95-5745-9A32-56CDD0587018}", 
"{2C7026A6-6EDC-BA4F-84EC-3DDADFFD4FD7}", "{2E489E9F-00BE-E342-8CAE-941618B2F0E6}", 
"{359CEB57-351F-F54F-B2BD-77A05FB6C349}", "{3727647C-B73A-184B-B187-D6BF75646B84}", 
"{7A4E6639-7387-7648-88EC-7FD27A0F258A}", "{854B02F2-5979-174A-AAE8-398C21664824}", 
"{89B5C791-1F71-0149-A2F7-F05E0197F501}", "{D92DF19A-9021-A740-AD99-DCCE1D88E064}"
), class = "factor"), Obs.nr = c(1, 1, 1, 1, 1), Gp.nr = c(1, 
3, 3, 4, 5)), row.names = c(NA, -5L), groups = structure(list(
Obs.nr = 1, .rows = structure(list(1:5), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

In here Gp.nr is my group number, IDIndividual1 is my ID.

Upvotes: 0

Views: 160

Answers (2)

Deepansh Arora
Deepansh Arora

Reputation: 742

Check this out:

## Creating the Dataframe
df = data.frame(ID = c("Aa","Bd","Cc","Dd","Cr"),
                GroupNumber=c(1,2,1,3,3))

## Loading the libraries
library(dplyr)
library(tidyverse)
library(stringr)

## Grouping to find out which observations share same group
df1 = df %>%
  group_by(GroupNumber) %>%
  summarise(ID_=paste(ID, collapse="-"),
            CountbyID = n_distinct(ID_)) %>%
  filter(str_detect(ID_, "-")) 

## Creating all possible pair combinations and then joining and concatenating all rows
df2 = data.frame(t(combn(df$ID,2))) %>%
  mutate(Comb = paste(X1,"-",X2, sep = "")) %>%
  left_join(df1, by=c("Comb"="ID_")) %>%
  select(Comb, CountbyID) %>%
  replace(is.na(.), 0) %>%
  group_by(CountbyID) %>%
  summarise(ID=paste(Comb, collapse=";"))

enter image description here

Hope this helps!

UPDATE

The way the dataframe is setup, its causing issues to the "IDIndividual1" column. Based on the way it is setup, it has more factor levels than the unique data points. Therefore, I simply converted it to a character. Try the code below:

df = df[,c("IDIndividual1","Gp.nr")]
colnames(df) = c("ID","GroupNumber")
df$ID = as.character(df$ID) ## Converting factors to characters
## Loading the libraries
library(dplyr)
library(tidyverse)
library(stringr)

## Grouping to find out which observations share same group
df1 = df %>%
  group_by(GroupNumber) %>%
  summarise(ID_=paste(ID, collapse="-"),
            CountbyID = n_distinct(ID_)) %>%
  filter(str_detect(ID_, "-")) 

## Creating all possible pair combinations and then joining and concatenating all rows
df2 = data.frame(t(combn(df$ID,2))) %>%
  distinct() %>%
  filter(X1 != X2) %>%
  mutate(Comb = paste(X1,"-",X2, sep = "")) %>%
  left_join(df1, by=c("Comb"="ID_")) %>%
  select(Comb, CountbyID) %>%
  replace(is.na(.), 0) %>%
  group_by(CountbyID) %>%
  summarise(ID=paste(Comb, collapse=";"))

enter image description here

Upvotes: 1

king_of_limes
king_of_limes

Reputation: 436

This is not efficient at all, but as a starting point you can use (GN denotes the group number)

my_ID <- unique(df$ID)
matrix <- matrix(nrow = length(my_ID),ncol = length(my_ID))

for (i in 1:length(my_ID)){
  for (j in 1:length(my_ID)){
    matrix[i,j] <- length(intersect(df$GN[df$ID == my_ID[i]],df$GN[df$ID == my_ID[j]]))}}

Upvotes: 1

Related Questions