Reputation: 3911

Network analsyis with lots of edge and relation in r?

I have more than 500 relations(edges) between 381 nodes. How can I visualize the social network graph? I tried using i-graph. But its output can't see because of too many edges. Those edges covered all nodes. So the figure looked a circle.

Do I have to use other tools to visualize? I have two data frames. One is like below. CUST_ID and SUB_CUST_ID are node key value. and fmly_reln_code is relation type value. And the other dataset has the type of nodes.

'data.frame':   426 obs. of  3 variables:
 $ CUST_ID       : int  21564 20672 1342 19239 1649 15039 15963 10455 12657 12921 ...
 $ SUB_CUST_ID   : int  20967 6462 12929 18556 13961 13961 5767 15377 17146 19488 ...
 $ FMLY_RELN_CODE: int  13 12 17 13 13 13 14 12 99 13 ...

and I have other data frame that points out the type of edge. I want to express node as a different color according to the information. How can I achieve this two problems? Moreover, I want to express the relation type on each node if I could. But With this, it will be really slow and complex to read and understand its output.

UPDATE

OK, As Keith requests, I will update this more. There are two data.frames. One is to have data for the relations with customers of an insurance company. CUST_ID and SUB_CUST_ID is a relation and FMLY_RELN_CODE is type of the relation.

 > str(network_df)
'data.frame':   257 obs. of  3 variables:
 $ CUST_ID       : int  21564 1342 15039 15963 10455 12657 9790 20267 21575 20534 ...
 $ SUB_CUST_ID   : int  20967 12929 13961 5767 15377 17146 19390 14629 5934 12708 ...
 $ FMLY_RELN_CODE: int  13 17 13 14 12 99 14 14 17 14 ...
> summary(network_df)
    CUST_ID       SUB_CUST_ID    FMLY_RELN_CODE 
 Min.   :   14   Min.   :   14   Min.   :12.00  
 1st Qu.: 5949   1st Qu.: 5841   1st Qu.:13.00  
 Median :12469   Median :12277   Median :14.00  
 Mean   :11648   Mean   :11536   Mean   :21.24  
 3rd Qu.:17057   3rd Qu.:17057   3rd Qu.:17.00  
 Max.   :22242   Max.   :22258   Max.   :99.00

And the other one is let me know for the customer to be whether or not a fraud. So I want to get network analysis to analyze the network of frauds. There are 326 customers and the relations doesn't have to have weight or direction.

> str(mapping)
'data.frame':   381 obs. of  2 variables:
 $ CUST_ID    : int  110 257 361 472 525 545 560 810 939 985 ...
 $ SIU_CUST_YN: chr  "N" "Y" "N" "Y" ...
> summary(mapping)
    CUST_ID      SIU_CUST_YN       
 Min.   :   14   Length:381        
 1st Qu.: 5949   Class :character  
 Median :12082   Mode  :character  
 Mean   :11651                     
 3rd Qu.:16964                     
 Max.   :22258

Finally, I want to visualize a social network where I can see the fraud network and the edge the color to point out whether or not it is a fraud. And If I could, I want to display the relation type on the edge.

Upvotes: 0

Answers (2)

Keith Hughitt

Reputation: 4960

I would suggest, rather than visualizing the network directly, you visualize the adjacency matrix underlying the network.

For example:

library(igraph)
library(gplots)

set.seed(1)

# generate a fake adj matrix with two highly connected modules
mat <- rbind(matrix(rep(c(1,1,1,0,0,0), 3), nrow=3, byrow=TRUE),
             matrix(rep(c(0,0,0,1,1,1), 3), nrow=3, byrow=TRUE))

This results in an adjacency matrix with two separate but completely connected components:

> mat
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    1    0    0    0
[2,]    1    1    1    0    0    0
[3,]    1    1    1    0    0    0
[4,]    0    0    0    1    1    1
[5,]    0    0    0    1    1    1
[6,]    0    0    0    1    1    1

Obviously, this is idealized, but it should give you an idea of what the visualization might look like with real data.

Next, we assign our node colors and visualize the matrix using a biclustering heatmap:

# assign a color to nodes
node_colors <- c('red', 'red', 'red', 'blue', 'blue', 'blue')

# visualize as a heatmap
heatmap.2(mat, trace='none', col='redgreen', 
          RowSideColors=node_colors, ColSideColors=node_colors)

The result should look like:

At least for a start, this should provide a cleaner way to understand the relationships underlying the data.

Later on, you can go back and create network visualizes for either smaller parts of the network, or a pruned version of the network.

In your case, it looks like you have an edge list with categorical "weights". You could read this in using the graph.edgelist function from igraph:

network_df <- data.frame(CUST_ID=c(21564, 1342, 15039, 15963,10455), SUB_CUST_ID=c(20967,12929, 13961, 5767, 15377), FMLY_RELN_CODE=c(13, 17, 13, 14, 12))

> network_df
  CUST_ID SUB_CUST_ID FMLY_RELN_CODE
1   21564       20967             13
2    1342       12929             17
3   15039       13961             13
4   15963        5767             14
5   10455       15377             12

# convert ids to character to avoid being treated as indices
network_df$CUST_ID <- as.character(network_df$CUST_ID)
network_df$SUB_CUST_ID <- as.character(network_df$SUB_CUST_ID)
g <- graph.edgelist(as.matrix(network_df[,1:2]), directed=FALSE)

> g
IGRAPH UN-- 10 5 -- 
+ attr: name (v/c)
+ edges (vertex names):
[1] 21564--20967 1342 --12929 15039--13961 15963--5767  10455--15377

Finally, to convert this to an adjacency matrix, use the get_adjacency function:

> get.adjacency(g, sparse=FALSE)
      21564 20967 1342 12929 15039 13961 15963 5767 10455 15377
21564     0     1    0     0     0     0     0    0     0     0
20967     1     0    0     0     0     0     0    0     0     0
1342      0     0    0     1     0     0     0    0     0     0
12929     0     0    1     0     0     0     0    0     0     0
15039     0     0    0     0     0     1     0    0     0     0
13961     0     0    0     0     1     0     0    0     0     0
15963     0     0    0     0     0     0     0    1     0     0
5767      0     0    0     0     0     0     1    0     0     0
10455     0     0    0     0     0     0     0    0     0     1
15377     0     0    0     0     0     0     0    0     1     0

Upvotes: 1

gopal krishna varshney

Reputation: 86

I would suggest you should use jsprit

I am using it for VRP and its derivatives for my research comparisons. I think, it should equally well for network analysis and optimization.

Check, algorithms like "ruin and recreate"

Upvotes: 0

Network analsyis with lots of edge and relation in r?

Answers (2)

Related Questions