Reputation: 23
I'm trying to connect authors that are cited in the same process. My nodes are the authors and the edges the processes, but I don't know how to create an edgelist.
What I have now ('Doutrina' means Author, 'Numero' means process number):
I want something like this (Here 'N' means how many times this conection happens, i.e. how many times they are cited together):
example data:
library(dplyr)
df <- tribble(
~Doutrina, ~Numero,
"MILARE, 2014", "1009526-53.2015.8.26.0032",
"SEGUIN, 2000", "0054387-89.2011.8.26.0224",
"SILVA, 2009", "0054387-89.2011.8.26.0224",
"MILARE, 2015", "0000351-14.2013.8.26.0326",
"SILVA, 2011", "0000351-14.2013.8.26.0326",
"MAXIMILIANO, 1961", "0000351-14.2013.8.26.0326",
"SILVA, 2009", "0000431-26.2013.8.26.0698",
"SEGUIN, 2000", "0000431-26.2013.8.26.0698",
"SILVA, 2009", "0054391-29.2011.8.26.0224",
"SEGUIN, 2000", "0054391-29.2011.8.26.0224",
"MAXIMILIANO, 2015", "0012360-28.2010.8.26.0224",
"MILARE, 2015", "0012360-28.2010.8.26.0224"
)
df
#> # A tibble: 12 x 2
#> Doutrina Numero
#> <chr> <chr>
#> 1 MILARE, 2014 1009526-53.2015.8.26.0032
#> 2 SEGUIN, 2000 0054387-89.2011.8.26.0224
#> 3 SILVA, 2009 0054387-89.2011.8.26.0224
#> 4 MILARE, 2015 0000351-14.2013.8.26.0326
#> 5 SILVA, 2011 0000351-14.2013.8.26.0326
#> 6 MAXIMILIANO, 1961 0000351-14.2013.8.26.0326
#> 7 SILVA, 2009 0000431-26.2013.8.26.0698
#> 8 SEGUIN, 2000 0000431-26.2013.8.26.0698
#> 9 SILVA, 2009 0054391-29.2011.8.26.0224
#> 10 SEGUIN, 2000 0054391-29.2011.8.26.0224
#> 11 MAXIMILIANO, 2015 0012360-28.2010.8.26.0224
#> 12 MILARE, 2015 0012360-28.2010.8.26.0224
Upvotes: 0
Views: 49
Reputation: 8848
I modified your example data so the results would be more interesting.
library(dplyr)
df <- tribble(
~Doutrina, ~Numero,
"MILARE, 2014", "1009526-53.2015.8.26.0032",
"SEGUIN, 2000", "0054387-89.2011.8.26.0224",
"SILVA, 2009", "0054387-89.2011.8.26.0224",
"MILARE, 2015", "0000351-14.2013.8.26.0326",
"SILVA, 2011", "0000351-14.2013.8.26.0326",
"MAXIMILIANO, 1961", "0000351-14.2013.8.26.0326",
"SILVA, 2009", "0000431-26.2013.8.26.0698",
"SEGUIN, 2000", "0000431-26.2013.8.26.0698",
"SILVA, 2009", "0054391-29.2011.8.26.0224",
"SEGUIN, 2000", "0054391-29.2011.8.26.0224",
"MAXIMILIANO, 2015", "0012360-28.2010.8.26.0224",
"MILARE, 2015", "0012360-28.2010.8.26.0224"
)
df %>%
mutate(Doutrina = sub(", [0-9]{4}", "", Doutrina)) %>% # remove the year
full_join(x = ., y = ., by = "Numero") %>% # join data to itself by Numero
select(Doutrina = Doutrina.x, Doutrina2 = Doutrina.y) %>% # keep only name columns
filter(Doutrina != Doutrina2) %>% # remove self-reference rows
filter(Doutrina < Doutrina2) %>% # only keep rows for one diretion of edge/link
group_by(Doutrina, Doutrina2) %>%
summarise(N = n(), .groups = "drop")
#> # A tibble: 4 x 3
#> Doutrina Doutrina2 N
#> <chr> <chr> <int>
#> 1 MAXIMILIANO MILARE 2
#> 2 MAXIMILIANO SILVA 1
#> 3 MILARE SILVA 1
#> 4 SEGUIN SILVA 3
Upvotes: 1