Creating an overlap matrix

Question

I have a list of individuals, charities, and years. I am trying to find out how many times individual i overlaps with individual j in a given charity and year. I would like to make a square matrix for every year and have any given cell tell me the number of overlaps.

Example of Data:

Individual    Year    Charity
    1         2003       A
    2         2003       A
    2         2003       B
    2         2005       A
   ...        ...       ...
   17         2003       A
   17         2003       B

Wanted Result 2003 (for every year):

    Individual       Individual_1    Individual_2    ...       Individual_17
        1                 .               1                      1
        2                 1               .                      2
       ...               ...             ...                    ...
        17                1               2                      .

I have heard that R is best for network data, but right now using Stata, I created a variable for each individual and then I am running an if statement that looks in the [_n+x] cell for the individual in the given column and places a one. I was then going to aggregate these data. This seems to be working but is very time intensive and I am sure there could be an error.

qui forval j = 1/1750 { 
gen individual_`j'= 0
}

qui forval j = 1/1750 {
replace individual_`j' = 1 if individual[_n+`j'] == 1 & year == 2002 & charity == "A"
}

qui forval j = 1/1750 {
replace individual_`j' = 1 if individual[_n+`j'] == 1 & year == 2003 & charity == "A"
}

qui forval j = 1/1750 {
replace individual_`j' = 1 if individual[_n+`j'] == 1 & year == 2004 & charity == "A"
}

qui forval j = 1/1750 {
replace individual_`j' = 1 if individual[_n+`j'] == 1 & year == 2005 & charity == "A"
}

I would then sum over each charity. The data are too numerous for this brute force to work, hopefully there is an easier way.

I am open to doing this outside of Stata.

JeremyS · Accepted Answer

I recently did something kind of similar. First add a column combining year and charity. Then convert the data frame into a list of charities per individual. I called your example of the data x

x$info <- paste(x$Year,x$Charity,sep="_")
All_Groups.list <- vector(length(unique(x$Individual)),mode="list")
names(All_Groups.list) <- as.character(unique(x$Individual))
for (i in 1:length(All_Groups.list)) {
  All_Groups.list[i] <- list(c(as.character(x[x$Individual == names(All_Groups.list)[i],4])))
}
Self.Cor.table <- sapply(All_Groups.list, function(x) {
  sapply(All_Groups.list,function(y){
length(x[x %in% y])
  })
})

The output is a correlation table where the numbers count the overlap in attended events

> Self.Cor.table
   1 2 17
1  1 1  1
2  1 3  2
17 1 2  2

This differs from your desired output by giving the number of events attended by each individual instead of a . which I think is important because each individual attends a different number of events.

If you want it per year subset the data frame by year and repeat for each subset.

Creating an overlap matrix

Answers (2)

Related Questions