Reputation: 1092
I’m having a bit of trouble using graph_from_data_frame properly - ERROR: ... the data frame should contain at least two columns when it already does.
I have a data frame, lets use a cohort of students as an example.
Each row is a student name, and there are multiple columns of metadata, most of which irrelevant. I would like to use one specific column “Class”, denoting which class they’re in (lets they're in 15 classes of 30 each). I would like to make a graph such that every student is a vertex, and students with the same value in the “Class” column get an undirected edge.
How would this command look like?
Just an update to add some context: the number of nodes/edges I wished to plot were incredibly large (it's not literally a class of students), so much so that the 1-to-1 representations used in the examples would be unfeasible. Hence, I was looking for a more efficient way to encode edges.
Upvotes: 0
Views: 1168
Reputation: 36
library(tidyverse)
library(igraph)
df = tibble(
class = c("1","1","1","2","2","2","3","3","3"),
name = c("a","b","c","d","e","f","g","h","i")
)
names = df %>% select(name)
relations = df %>%
mutate(name2 = df$name)
for (i in unique(select(df,class))$class){
from = relations %>%
filter(class == i) %>%
select(name)
to = relations %>%
filter(class == i) %>%
select(name2)
# Form relationships between all students in each class
if (i == 1){edge_list = tidyr::crossing(from, to)}
else {edge_list = bind_rows(edge_list, tidyr::crossing(from, to))}
}
# Prevent self-loop edges and duplicate relationships
edge_list = edge_list %>% filter(name != name2)
edge_list = edge_list[!duplicated(t(apply(edge_list, 1, sort))), ]
plot(graph_from_data_frame(edge_list, directed = FALSE, vertices = names))
Upvotes: 2