Identifying and summarizing discrete groups of nodes in R

Question

I am working on a networking problem related to family/household composition. I have multiple edge tables containing id1, id2 and a relationship code to state the type of relationship between the identity variables. These tables are large, upwards of 7 million rows in each. I also have a node table which contains the same id and various attributes.

What I want to achieve is an adjacency matrix which will give summary statistics similar to something like this:

                      Children

             1  2  3  4   total 
            --------------------
          1 | 1  0  1  0    2
            |
 Adults   2 | 3  5  4  1    13  
            |
          3 | 1  2  0  0    3
            |
      total | 5  7  5  1    18

Essentially I want to be able to identify and count distinct networks in my data.

My data is in the form:

             ID1  ID2   Relationship_Code

              X1   X2    Married 
              X1   X3    Parent/Child
              X1   X4    Parent/Child 
              X5   X6    Married
              X5   X7    Parent/Child 
              X6   X5    Married
               .    .     .
               .    .     .
               .    .     .

I also have a node table which contains date of birth and other variables from which adult/child status can be identified.

Any tips/hints on how to extract this summary information from the graph data frame would be very helpful and much appreciated.

Thanks

G5W · Accepted Answer

Some of the work that is required to get the final table that you want requires access to the node table which you are not showing us, but I can get you pretty far along in your problem.

I think that the key to getting your result is identifying the households. You can do this in igraph using components. The connected components are households. I will illustrate with a slightly more elaborate version of your example.

Data:

Census = read.table(text="ID1  ID2   Relationship_Code
              X1   X2    Married 
              X2   X1    Married 
              X1   X3    Parent/Child
              X1   X4    Parent/Child 
              X2   X3    Parent/Child
              X2   X4    Parent/Child 
              X5   X6    Married
              X5   X7    Parent/Child 
              X6   X7    Parent/Child 
              X6   X5    Married
              X8   X9    Married
              X9   X8    Married",
    header=T)

Now turn it into a graph, find the components and check by plotting.

library(igraph)
EL = as.matrix(Census[,1:2])
Pop = graph_from_edgelist(EL)
Households = components(Pop)
plot(Pop, vertex.color=rainbow(3, alpha=0.5)[Households$membership])

You said that you could label the nodes as to whether they represent adults or children. I will assume that we have such a labeling. From that, it is easy to count the number of adults by household and children by household and to make a table of household decomposition by adults and children.

V(Pop)$AdultChild = c('A', 'A', 'C', 'C', 'A', 'A', 'C', 'A', 'A')
AdultsByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership), 
    function(p) sum(p=='A'))
AdultsByHousehold
  Group.1 x
1       1 2
2       2 2
3       3 2

ChildrenByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership), 
    function(p) sum(p=='C'))
ChildrenByHousehold
  Group.1 x
1       1 2
2       2 1
3       3 0

table(AdultsByHousehold$x, ChildrenByHousehold$x)
    0 1 2
  2 1 1 1

In my bogus example, all households have two adults.

Identifying and summarizing discrete groups of nodes in R

Answers (1)

Related Questions