How to create a network if I only have the edges names?

I'm trying to connect authors that are cited in the same process. My nodes are the authors and the edges the processes, but I don't know how to create an edgelist.

What I have now ('Doutrina' means Author, 'Numero' means process number): image of data

I want something like this (Here 'N' means how many times this conection happens, i.e. how many times they are cited together):

image of desired output


example data:

library(dplyr)

df <- tribble(
  ~Doutrina,           ~Numero,
  "MILARE, 2014",      "1009526-53.2015.8.26.0032",
  "SEGUIN, 2000",      "0054387-89.2011.8.26.0224",
  "SILVA, 2009",       "0054387-89.2011.8.26.0224",
  "MILARE, 2015",      "0000351-14.2013.8.26.0326",
  "SILVA, 2011",       "0000351-14.2013.8.26.0326",
  "MAXIMILIANO, 1961", "0000351-14.2013.8.26.0326",
  "SILVA, 2009",       "0000431-26.2013.8.26.0698",
  "SEGUIN, 2000",      "0000431-26.2013.8.26.0698",
  "SILVA, 2009",       "0054391-29.2011.8.26.0224",
  "SEGUIN, 2000",      "0054391-29.2011.8.26.0224",
  "MAXIMILIANO, 2015", "0012360-28.2010.8.26.0224",
  "MILARE, 2015",      "0012360-28.2010.8.26.0224"
)

df
#> # A tibble: 12 x 2
#>    Doutrina          Numero                   
#>    <chr>             <chr>                    
#>  1 MILARE, 2014      1009526-53.2015.8.26.0032
#>  2 SEGUIN, 2000      0054387-89.2011.8.26.0224
#>  3 SILVA, 2009       0054387-89.2011.8.26.0224
#>  4 MILARE, 2015      0000351-14.2013.8.26.0326
#>  5 SILVA, 2011       0000351-14.2013.8.26.0326
#>  6 MAXIMILIANO, 1961 0000351-14.2013.8.26.0326
#>  7 SILVA, 2009       0000431-26.2013.8.26.0698
#>  8 SEGUIN, 2000      0000431-26.2013.8.26.0698
#>  9 SILVA, 2009       0054391-29.2011.8.26.0224
#> 10 SEGUIN, 2000      0054391-29.2011.8.26.0224
#> 11 MAXIMILIANO, 2015 0012360-28.2010.8.26.0224
#> 12 MILARE, 2015      0012360-28.2010.8.26.0224

Upvotes: 0

Views: 49

Answers (1)

CJ Yetman
CJ Yetman

Reputation: 8848

I modified your example data so the results would be more interesting.

library(dplyr)

df <- tribble(
  ~Doutrina,           ~Numero,
  "MILARE, 2014",      "1009526-53.2015.8.26.0032",
  "SEGUIN, 2000",      "0054387-89.2011.8.26.0224",
  "SILVA, 2009",       "0054387-89.2011.8.26.0224",
  "MILARE, 2015",      "0000351-14.2013.8.26.0326",
  "SILVA, 2011",       "0000351-14.2013.8.26.0326",
  "MAXIMILIANO, 1961", "0000351-14.2013.8.26.0326",
  "SILVA, 2009",       "0000431-26.2013.8.26.0698",
  "SEGUIN, 2000",      "0000431-26.2013.8.26.0698",
  "SILVA, 2009",       "0054391-29.2011.8.26.0224",
  "SEGUIN, 2000",      "0054391-29.2011.8.26.0224",
  "MAXIMILIANO, 2015", "0012360-28.2010.8.26.0224",
  "MILARE, 2015",      "0012360-28.2010.8.26.0224"
)

df %>% 
  mutate(Doutrina = sub(", [0-9]{4}", "", Doutrina)) %>%  # remove the year
  full_join(x = ., y = ., by = "Numero") %>%  # join data to itself by Numero
  select(Doutrina = Doutrina.x, Doutrina2 = Doutrina.y) %>%  # keep only name columns
  filter(Doutrina != Doutrina2) %>%  # remove self-reference rows
  filter(Doutrina < Doutrina2) %>%  # only keep rows for one diretion of edge/link
  group_by(Doutrina, Doutrina2) %>% 
  summarise(N = n(), .groups = "drop")
#> # A tibble: 4 x 3
#>   Doutrina    Doutrina2     N
#>   <chr>       <chr>     <int>
#> 1 MAXIMILIANO MILARE        2
#> 2 MAXIMILIANO SILVA         1
#> 3 MILARE      SILVA         1
#> 4 SEGUIN      SILVA         3

Upvotes: 1

Related Questions