Alex
Alex

Reputation: 245

Automating a loop in igraph using tidygraph

Hello and hope all goes well. I made an edit to my previous question and hope it makes it more clear.

I created an igraph object and would like to run same analysis several times and extract some information in each iteration.

I can't share the whole data, so I am sharing just a small subset. df_edge is as follows:

library(dplyr)
job_1 <-c(1,2,6,6,5,6,7,8,6,8,8,6,6,8)
job_2 <- c(2,4,5,8,3,1,4,6,1,7,3,2,4,5)
weight <- c(1,1,1,2,1,1,2,1,1,1,2,1,1,1)

df_edge <- tibble(job_1,job_2,weight)
df_edge %>% glimpse()

Rows: 14
Columns: 3
$ job_1  <dbl> 1, 2, 6, 6, 5, 6, 7, 8, 6, 8, 8, 6, 6, 8
$ job_2  <dbl> 2, 4, 5, 8, 3, 1, 4, 6, 1, 7, 3, 2, 4, 5
$ weight <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1

df_node is as follows:

job_id <- c(1,2,3,4,5,6,7,8)
job_type <- c(1,2,0,0,3,1,1,1)

df_node <- tibble(job_id,job_type)
df_node %>% glimpse()

Rows: 8
Columns: 2
$ job_id   <dbl> 1, 2, 3, 4, 5, 6, 7, 8
$ job_type <dbl> 1, 2, 0, 0, 3, 1, 1, 1

Creating the igraph object:

library(igraph)
library(tidygraph)

tp_network_subset <- graph.data.frame(df_edge,vertices = df_node,directed = F)

summary of job_type column in the df_node

    df_node %>%
     count(job_type)
   
A tibble: 4 x 2
  job_type     n
     <dbl> <int>
1        0     2
2        1     4
3        2     1
4        3     1

What I am doing manually is the following:

### finding a job_id that belongs to job_type==1 category

    df_node %>% filter(job_type==1) %>%
    select(job_id) 

 A tibble: 4 x 1
  job_id
   <dbl>
1      1
2      6
3      7
4      8
# for instance, I picked one of them and it is job_id = 6
### using the job_id to create a subgraph by selecting order 1 neighbors of this job_id (6)

node_test <- make_ego_graph(tp_network_subset,order = 1 ,nodes="6")

### creating a dataframe of this subgrapgh where there is no isolated nodes

df_test <- as_tbl_graph(node_test[[1]]) %>% 
    activate(nodes) %>%
    filter(!node_is_isolated()) %>% 
    as_tibble()
df_test %>% glimpse()
Rows: 6
Columns: 2
$ name     <chr> "1", "2", "4", "5", "6", "8"
$ job_type <dbl> 1, 2, 0, 3, 1, 1

## subgraph size is 6 which will be an outcome of interest
### if the graph is zero length , I should stop here and pick another job_id that belongs to job_type==1 category

In this example, the graph in not zero length so I proceed to the next step

 ### calculating the measure of interest in respect to job_type==1 category
 
   df_test %>% 
    summarise(job_rate= (nrow(df_test %>% filter(job_type==1)))/(nrow(df_test %>% 
    filter(job_type %in% c(1,2,3)))))
# 0.6

if job_rate > 0.5 , I want to keep the job_rate and rows (corresponding nodes) of the job_type=4 category of the subgraph. in this instance, job_rate was 0.6 so I am keeping the following

 df_final <- as_tbl_graph(node_test[[1]]) %>% 
        activate(nodes) %>%
        filter(!node_is_isolated()) %>% 
        as_tibble() %>% filter(job_type==0)

# A tibble: 1 x 2
   name  job_type
    <chr>    <dbl>
1    4            0

But, I need to assign their corresponding job__rate and some other related columns. So, my favorite outcome would be

    name  job_type    subgraph_origin_id      job_rate  subgraph_size  no_(job_type==0)_in_subgrapgh    no_(job_type==1)_in_subgrapgh   no_(job_type==2)_in_subgrapgh   no_(job_type==3)_in_subgrapgh                                                           
    <chr>    <dbl>
1    4         0             6                  0.6         6

so, I need to do this process and create subgrapghs for all job_type==1 nodes. If the grapgh is not zero length and its job_rate > 0.5 then extract all the corresponding nodes in that subgrapgh along with the job_rate and other columns shown in the favorite outcome.

Upvotes: 1

Views: 320

Answers (1)

ThomasIsCoding
ThomasIsCoding

Reputation: 102241

Does this work for you?

dflst <- split(df_node, job_type)
tpe <- as.numeric(names(dflst))
out <- tibble()
for (i in seq_along(dflst)) {
  df <- dflst[[i]]
  node_test_lst <- make_ego_graph(tp_network_subset, order = 1, nodes = df$job_id)
  origin_id <- df$job_id
  jtpe <- tpe[i]
  for (j in seq_along(node_test_lst)) {
    node_test <- node_test_lst[[j]]
    df_test <- as_tbl_graph(node_test) %>%
      activate(nodes) %>%
      filter(!node_is_isolated()) %>%
      as_tibble()
    if (nrow(df_test %>% filter(job_type == 0)) > 0 & any(df_test$job_type %in% 1:3)) {
      job_rate <- with(df_test, sum(job_type == jtpe) / sum(job_type %in% 1:3))
      if (job_rate > 0.5) {
        df_final <- df_test %>%
          filter(job_type == 0) %>%
          mutate(
            subgraph_origin_id = origin_id[j],
            job_rate = job_rate,
            subgraph_size = nrow(df_test)
          ) %>%
          cbind(
            setNames(
              as.list(table(factor(df_test$job_type, levels = 0:3))),
              sprintf("no_(job_type==%s)_in_subgrapgh", 0:3)
            )
          )
        out <- out %>% rbind(df_final)
      }
    }
  }
}

which gives

> out
  name job_type subgraph_origin_id job_rate subgraph_size
1    4        0                  6     0.60             6
2    4        0                  7     1.00             3
3    3        0                  8     0.75             5
  no_(job_type==0)_in_subgrapgh no_(job_type==1)_in_subgrapgh
1                             1                             3
2                             1                             2
3                             1                             3
  no_(job_type==2)_in_subgrapgh no_(job_type==3)_in_subgrapgh
1                             1                             1
2                             0                             0
3                             0                             1

Upvotes: 1

Related Questions