Reputation: 77
I am working on a networking problem related to family/household composition. I have multiple edge tables containing id1, id2 and a relationship code to state the type of relationship between the identity variables. These tables are large, upwards of 7 million rows in each. I also have a node table which contains the same id and various attributes.
What I want to achieve is an adjacency matrix which will give summary statistics similar to something like this:
Children
1 2 3 4 total
--------------------
1 | 1 0 1 0 2
|
Adults 2 | 3 5 4 1 13
|
3 | 1 2 0 0 3
|
total | 5 7 5 1 18
Essentially I want to be able to identify and count distinct networks in my data.
My data is in the form:
ID1 ID2 Relationship_Code
X1 X2 Married
X1 X3 Parent/Child
X1 X4 Parent/Child
X5 X6 Married
X5 X7 Parent/Child
X6 X5 Married
. . .
. . .
. . .
I also have a node table which contains date of birth and other variables from which adult/child status can be identified.
Any tips/hints on how to extract this summary information from the graph data frame would be very helpful and much appreciated.
Thanks
Upvotes: 0
Views: 289
Reputation: 37661
Some of the work that is required to get the final table that you want requires access to the node table which you are not showing us, but I can get you pretty far along in your problem.
I think that the key to getting your result is identifying the households.
You can do this in igraph
using components
. The connected components are households.
I will illustrate with a slightly more elaborate version of your example.
Data:
Census = read.table(text="ID1 ID2 Relationship_Code
X1 X2 Married
X2 X1 Married
X1 X3 Parent/Child
X1 X4 Parent/Child
X2 X3 Parent/Child
X2 X4 Parent/Child
X5 X6 Married
X5 X7 Parent/Child
X6 X7 Parent/Child
X6 X5 Married
X8 X9 Married
X9 X8 Married",
header=T)
Now turn it into a graph, find the components and check by plotting.
library(igraph)
EL = as.matrix(Census[,1:2])
Pop = graph_from_edgelist(EL)
Households = components(Pop)
plot(Pop, vertex.color=rainbow(3, alpha=0.5)[Households$membership])
You said that you could label the nodes as to whether they represent adults or children. I will assume that we have such a labeling. From that, it is easy to count the number of adults by household and children by household and to make a table of household decomposition by adults and children.
V(Pop)$AdultChild = c('A', 'A', 'C', 'C', 'A', 'A', 'C', 'A', 'A')
AdultsByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership),
function(p) sum(p=='A'))
AdultsByHousehold
Group.1 x
1 1 2
2 2 2
3 3 2
ChildrenByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership),
function(p) sum(p=='C'))
ChildrenByHousehold
Group.1 x
1 1 2
2 2 1
3 3 0
table(AdultsByHousehold$x, ChildrenByHousehold$x)
0 1 2
2 1 1 1
In my bogus example, all households have two adults.
Upvotes: 2