rjturn
rjturn

Reputation: 323

igraph graph.data.frame silently converts factors to character vectors

Today I learned that igraph silently loses factors on graph.data.frame, so factors in the vertex data frame are converted to character vectors. Is there a way to retain the factor type e.g. for V(g)$factor_var and df <- get.data.frame(g, what="vertices"); df$factor_var? In the following code, gender is the factor_var:

actors <- data.frame(name=c("Alice", "Bob", "Cecil", "David", "Esmeralda"),
                     age=c(48,33,45,34,21),
                     gender=factor(c("F","M","F","M","F")))
relations <- data.frame(from=c("Bob", "Cecil", "Cecil", "David",
                               "David", "Esmeralda"),
                        to=c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
                        same.dept=c(FALSE,FALSE,TRUE,FALSE,FALSE,TRUE),
                        friendship=c(4,5,5,2,1,1), advice=c(4,5,5,4,2,3))
g <- graph.data.frame(relations, directed=TRUE, vertices=actors)
g_actors <- get.data.frame(g, what="vertices")

# Compare type of gender (before and after)
is.factor(actors$gender)
is.factor(g_actors$gender)

In this reproducible example, actors$gender is a factor but g_actors$gender is not. In my opinion, it should be. I found no comment about this issue in the documentation.

This is important because exporting vertices via get.data.frame for linear regression looses factors (linear regression converts factors to dummy variables, but ignores character vectors). I noticed because my factor variables disappeared in the output.

Of course, I can recreate the factors after exporting from igraph, but this is tedious because I have a lot of graphs and the level ordering is all wrong (and I do not believe it should be necessary, unless igraph cannot support this behavior across its C++ and python versions).

Ryan

Upvotes: 2

Views: 697

Answers (1)

Gabor Csardi
Gabor Csardi

Reputation: 10825

Yes, graph.data.frame has

newval <- d[, i]
if (class(newval) == "factor") {
  newval <- as.character(newval)
}
attrs[[names(d)[i]]] <- newval

so it converts factors to characters. I am not sure why, but it has been there forever: https://github.com/igraph/igraph/blame/c5849a89739c0dd058ff0a770aff2443745636fa/interfaces/R/igraph/R/structure.generators.R#L602

As a workaround, you can create a copy of the function, under a different name, and remove these three lines.

If you think that this is a bug, then please also open an issue at https://github.com/igraph/igraph/issues and I'll add an option not too convert. I think the default will still be to convert, just because it has been there for a long time, and people might rely on it.

Upvotes: 3

Related Questions