Reputation: 3911
I have more than 500 relations(edges) between 381 nodes. How can I visualize the social network graph? I tried using i-graph. But its output can't see because of too many edges. Those edges covered all nodes. So the figure looked a circle.
Do I have to use other tools to visualize? I have two data frames. One is like below. CUST_ID and SUB_CUST_ID are node key value. and fmly_reln_code is relation type value. And the other dataset has the type of nodes.
'data.frame': 426 obs. of 3 variables:
$ CUST_ID : int 21564 20672 1342 19239 1649 15039 15963 10455 12657 12921 ...
$ SUB_CUST_ID : int 20967 6462 12929 18556 13961 13961 5767 15377 17146 19488 ...
$ FMLY_RELN_CODE: int 13 12 17 13 13 13 14 12 99 13 ...
and I have other data frame that points out the type of edge. I want to express node as a different color according to the information. How can I achieve this two problems? Moreover, I want to express the relation type on each node if I could. But With this, it will be really slow and complex to read and understand its output.
UPDATE
OK, As Keith requests, I will update this more. There are two data.frames. One is to have data for the relations with customers of an insurance company. CUST_ID and SUB_CUST_ID is a relation and FMLY_RELN_CODE is type of the relation.
> str(network_df)
'data.frame': 257 obs. of 3 variables:
$ CUST_ID : int 21564 1342 15039 15963 10455 12657 9790 20267 21575 20534 ...
$ SUB_CUST_ID : int 20967 12929 13961 5767 15377 17146 19390 14629 5934 12708 ...
$ FMLY_RELN_CODE: int 13 17 13 14 12 99 14 14 17 14 ...
> summary(network_df)
CUST_ID SUB_CUST_ID FMLY_RELN_CODE
Min. : 14 Min. : 14 Min. :12.00
1st Qu.: 5949 1st Qu.: 5841 1st Qu.:13.00
Median :12469 Median :12277 Median :14.00
Mean :11648 Mean :11536 Mean :21.24
3rd Qu.:17057 3rd Qu.:17057 3rd Qu.:17.00
Max. :22242 Max. :22258 Max. :99.00
And the other one is let me know for the customer to be whether or not a fraud. So I want to get network analysis to analyze the network of frauds. There are 326 customers and the relations doesn't have to have weight or direction.
> str(mapping)
'data.frame': 381 obs. of 2 variables:
$ CUST_ID : int 110 257 361 472 525 545 560 810 939 985 ...
$ SIU_CUST_YN: chr "N" "Y" "N" "Y" ...
> summary(mapping)
CUST_ID SIU_CUST_YN
Min. : 14 Length:381
1st Qu.: 5949 Class :character
Median :12082 Mode :character
Mean :11651
3rd Qu.:16964
Max. :22258
Finally, I want to visualize a social network where I can see the fraud network and the edge the color to point out whether or not it is a fraud. And If I could, I want to display the relation type on the edge.
Upvotes: 0
Views: 154
Reputation: 4960
I would suggest, rather than visualizing the network directly, you visualize the adjacency matrix underlying the network.
For example:
library(igraph)
library(gplots)
set.seed(1)
# generate a fake adj matrix with two highly connected modules
mat <- rbind(matrix(rep(c(1,1,1,0,0,0), 3), nrow=3, byrow=TRUE),
matrix(rep(c(0,0,0,1,1,1), 3), nrow=3, byrow=TRUE))
This results in an adjacency matrix with two separate but completely connected components:
> mat
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 0 0 0
[2,] 1 1 1 0 0 0
[3,] 1 1 1 0 0 0
[4,] 0 0 0 1 1 1
[5,] 0 0 0 1 1 1
[6,] 0 0 0 1 1 1
Obviously, this is idealized, but it should give you an idea of what the visualization might look like with real data.
Next, we assign our node colors and visualize the matrix using a biclustering heatmap:
# assign a color to nodes
node_colors <- c('red', 'red', 'red', 'blue', 'blue', 'blue')
# visualize as a heatmap
heatmap.2(mat, trace='none', col='redgreen',
RowSideColors=node_colors, ColSideColors=node_colors)
The result should look like:
At least for a start, this should provide a cleaner way to understand the relationships underlying the data.
Later on, you can go back and create network visualizes for either smaller parts of the network, or a pruned version of the network.
In your case, it looks like you have an edge list with categorical "weights". You could read this in using the graph.edgelist
function from igraph:
network_df <- data.frame(CUST_ID=c(21564, 1342, 15039, 15963,10455), SUB_CUST_ID=c(20967,12929, 13961, 5767, 15377), FMLY_RELN_CODE=c(13, 17, 13, 14, 12))
> network_df
CUST_ID SUB_CUST_ID FMLY_RELN_CODE
1 21564 20967 13
2 1342 12929 17
3 15039 13961 13
4 15963 5767 14
5 10455 15377 12
# convert ids to character to avoid being treated as indices
network_df$CUST_ID <- as.character(network_df$CUST_ID)
network_df$SUB_CUST_ID <- as.character(network_df$SUB_CUST_ID)
g <- graph.edgelist(as.matrix(network_df[,1:2]), directed=FALSE)
> g
IGRAPH UN-- 10 5 --
+ attr: name (v/c)
+ edges (vertex names):
[1] 21564--20967 1342 --12929 15039--13961 15963--5767 10455--15377
Finally, to convert this to an adjacency matrix, use the get_adjacency
function:
> get.adjacency(g, sparse=FALSE)
21564 20967 1342 12929 15039 13961 15963 5767 10455 15377
21564 0 1 0 0 0 0 0 0 0 0
20967 1 0 0 0 0 0 0 0 0 0
1342 0 0 0 1 0 0 0 0 0 0
12929 0 0 1 0 0 0 0 0 0 0
15039 0 0 0 0 0 1 0 0 0 0
13961 0 0 0 0 1 0 0 0 0 0
15963 0 0 0 0 0 0 0 1 0 0
5767 0 0 0 0 0 0 1 0 0 0
10455 0 0 0 0 0 0 0 0 0 1
15377 0 0 0 0 0 0 0 0 1 0
Upvotes: 1
Reputation: 86
I would suggest you should use jsprit
I am using it for VRP and its derivatives for my research comparisons. I think, it should equally well for network analysis and optimization.
Check, algorithms like "ruin and recreate"
Upvotes: 0