Reputation: 323
Today I learned that igraph silently loses factors on graph.data.frame, so factors in the vertex data frame are converted to character vectors. Is there a way to retain the factor type e.g. for V(g)$factor_var
and df <- get.data.frame(g, what="vertices"); df$factor_var
? In the following code, gender
is the factor_var
:
actors <- data.frame(name=c("Alice", "Bob", "Cecil", "David", "Esmeralda"),
age=c(48,33,45,34,21),
gender=factor(c("F","M","F","M","F")))
relations <- data.frame(from=c("Bob", "Cecil", "Cecil", "David",
"David", "Esmeralda"),
to=c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
same.dept=c(FALSE,FALSE,TRUE,FALSE,FALSE,TRUE),
friendship=c(4,5,5,2,1,1), advice=c(4,5,5,4,2,3))
g <- graph.data.frame(relations, directed=TRUE, vertices=actors)
g_actors <- get.data.frame(g, what="vertices")
# Compare type of gender (before and after)
is.factor(actors$gender)
is.factor(g_actors$gender)
In this reproducible example, actors$gender is a factor but g_actors$gender is not. In my opinion, it should be. I found no comment about this issue in the documentation.
This is important because exporting vertices via get.data.frame
for linear regression looses factors (linear regression converts factors to dummy variables, but ignores character vectors). I noticed because my factor variables disappeared in the output.
Of course, I can recreate the factors after exporting from igraph, but this is tedious because I have a lot of graphs and the level ordering is all wrong (and I do not believe it should be necessary, unless igraph cannot support this behavior across its C++ and python versions).
Ryan
Upvotes: 2
Views: 697
Reputation: 10825
Yes, graph.data.frame
has
newval <- d[, i]
if (class(newval) == "factor") {
newval <- as.character(newval)
}
attrs[[names(d)[i]]] <- newval
so it converts factors to characters. I am not sure why, but it has been there forever: https://github.com/igraph/igraph/blame/c5849a89739c0dd058ff0a770aff2443745636fa/interfaces/R/igraph/R/structure.generators.R#L602
As a workaround, you can create a copy of the function, under a different name, and remove these three lines.
If you think that this is a bug, then please also open an issue at https://github.com/igraph/igraph/issues and I'll add an option not too convert. I think the default will still be to convert, just because it has been there for a long time, and people might rely on it.
Upvotes: 3