Create graph from data frame - rows as vertices and common column values as edges?

Question

I’m having a bit of trouble using graph_from_data_frame properly - ERROR: ... the data frame should contain at least two columns when it already does.

I have a data frame, lets use a cohort of students as an example.

Each row is a student name, and there are multiple columns of metadata, most of which irrelevant. I would like to use one specific column “Class”, denoting which class they’re in (lets they're in 15 classes of 30 each). I would like to make a graph such that every student is a vertex, and students with the same value in the “Class” column get an undirected edge.

How would this command look like?

Just an update to add some context: the number of nodes/edges I wished to plot were incredibly large (it's not literally a class of students), so much so that the 1-to-1 representations used in the examples would be unfeasible. Hence, I was looking for a more efficient way to encode edges.

AzureOSK · Accepted Answer

library(tidyverse)
library(igraph)

df = tibble(
  class = c("1","1","1","2","2","2","3","3","3"), 
  name = c("a","b","c","d","e","f","g","h","i")
)

names = df %>% select(name)
relations = df %>% 
  mutate(name2 = df$name)

for (i in unique(select(df,class))$class){
  from = relations %>%
    filter(class == i) %>%
    select(name)

  to = relations %>%
    filter(class == i) %>%
    select(name2)

  # Form relationships between all students in each class
  if (i == 1){edge_list = tidyr::crossing(from, to)} 
  else {edge_list = bind_rows(edge_list, tidyr::crossing(from, to))}
}

# Prevent self-loop edges and duplicate relationships
edge_list = edge_list %>% filter(name != name2) 
edge_list = edge_list[!duplicated(t(apply(edge_list, 1, sort))), ]

plot(graph_from_data_frame(edge_list, directed = FALSE, vertices = names))

Create graph from data frame - rows as vertices and common column values as edges?

Answers (1)

Related Questions