Reputation: 60
I have a question following my answer to this question on this question Matching vertex attributes across a list of edgelists R
My solution was to use for loops, but we should always try to optimize(vectorize) when we can.
What I'm trying to understand is how I would vectorize the solution I made in the post.
My solution was
for(i in 1:length(graph_list)){
graph_list[[i]]=set_vertex_attr(graph_list[[i]],"gender", value=attribute_df$gender[match(V(graph_list[[i]])$name, attribute_df$names)])
}
Ideally we could vectorize this with lapply
but I'm having some trouble conceiving how to do that. Here's what I've got
graph_lists_new=lapply(graph_list, set_vertex_attr, value=attribute_df$gender[match(V(??????????)$name, attribute_df$names)]))
What I'm unclear about is what I'd put in the part with the ??????
. The thing inside the V()
function should be each item in the list, but what I don't get is what I'd put inside when I'm using lapply
.
All data can be found in the link I posted, but here's the data anyway
attribute_df<- structure(list(names = structure(c(6L, 7L, 5L, 2L, 1L, 8L, 3L,
4L), .Label = c("Andy", "Angela", "Eric", "Jamie", "Jeff", "Jim",
"Pam", "Tim"), class = "factor"), gender = structure(c(3L, 2L,
3L, 2L, 3L, 1L, 1L, 2L), .Label = c("", "F", "M"), class = "factor"),
happiness = c(8, 9, 4.5, 5.7, 5, 6, 7, 8)), class = "data.frame", row.names = c(NA,
-8L))
edgelist<-list(structure(list(nominator1 = structure(c(3L, 4L, 1L, 2L), .Label = c("Angela",
"Jeff", "Jim", "Pam"), class = "factor"), nominee1 = structure(c(1L,
2L, 3L, 2L), .Label = c("Andy", "Angela", "Jeff"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L)), structure(list(nominator2 = structure(c(4L, 1L, 2L, 3L
), .Label = c("Eric", "Jamie", "Oscar", "Tim"), class = "factor"),
nominee2 = structure(c(1L, 3L, 2L, 3L), .Label = c("Eric",
"Oscar", "Tim"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L)))
graph_list<- lapply(edgelist, graph_from_data_frame)
Upvotes: 0
Views: 49
Reputation: 145765
Since you need to use graph_list[[i]]
multiple times in your call, to use lapply
you need to write a custom function, such as this anonymous function. (It's the same code as your loop, I just wrapped it in function(x)
and replaced all instances of graph_list[[i]]
with x
.)
graph_list = lapply(graph_list, function(x)
set_vertex_attr(x, "gender", value = attribute_df$gender[match(V(x)$name, attribute_df$names)])
)
(Note that I didn't test this, but it should work unless I made a typo.)
lapply
isn't vectorization---it's just "loop hiding". In this case, I think your for
loop is a nicer way to do things than lapply
. Especially since you are modifying existing objects, your simple for
loop will probably be more efficient than an lapply
solution, as well as more readable.
When we talk about vectorization for efficiency, we almost always mean atomic vectors, not list
s. (It's vectorization, after all, not listization.) The reason to use lapply
and related functions (sapply
, vapply
, Map
, most of the purrr
package) isn't computer efficiency, it's readability, and human-efficiency to write.
Let's say you have a list of data frames, my_list = list(iris, mtcars, CO2)
. If you want to get the number of rows for each of the data frames in the list and store it in a variable, we could use sapply
or a for
loop:
# easy to write, easy to read
rows_apply = sapply(my_list, nrow)
# annoying to read and write
rows_for = integer(length(my_list))
for (i in seq_along(my_list)) rows_for[i] = nrow(my_list[[i]])
But the more complex your task gets, the more readable a for
loop becomes compared to an alternative like these. In your case, I'd prefer the for
loop.
For more reading on this, see the old question Is apply more than syntactic sugar?. Since those answers were written, R has been upgraded to include a just-in-time compiler, which further speeds up for
loops relative to apply. In the nearly 10-year-old answers there, you'll see that sometimes *apply
is slightly faster than a for
loop. Since the JIT compiler, I think you'll find the opposite: most of the time a for
loop is slightly faster than *apply
.
But in both of those cases, unless you're doing something absolutely trivial inside the for/apply, whatever you do inside for/apply will dominate the timings.
Upvotes: 2