Using apply family and multiple functions on lists in R

Question

I have a question following my answer to this question on this question Matching vertex attributes across a list of edgelists R

My solution was to use for loops, but we should always try to optimize(vectorize) when we can.

What I'm trying to understand is how I would vectorize the solution I made in the post.

My solution was

for(i in 1:length(graph_list)){
  graph_list[[i]]=set_vertex_attr(graph_list[[i]],"gender", value=attribute_df$gender[match(V(graph_list[[i]])$name, attribute_df$names)])
}

Ideally we could vectorize this with lapply but I'm having some trouble conceiving how to do that. Here's what I've got

graph_lists_new=lapply(graph_list, set_vertex_attr, value=attribute_df$gender[match(V(??????????)$name, attribute_df$names)]))

What I'm unclear about is what I'd put in the part with the ??????. The thing inside the V() function should be each item in the list, but what I don't get is what I'd put inside when I'm using lapply.

All data can be found in the link I posted, but here's the data anyway

attribute_df<- structure(list(names = structure(c(6L, 7L, 5L, 2L, 1L, 8L, 3L, 
4L), .Label = c("Andy", "Angela", "Eric", "Jamie", "Jeff", "Jim", 
"Pam", "Tim"), class = "factor"), gender = structure(c(3L, 2L, 
3L, 2L, 3L, 1L, 1L, 2L), .Label = c("", "F", "M"), class = "factor"), 
    happiness = c(8, 9, 4.5, 5.7, 5, 6, 7, 8)), class = "data.frame", row.names = c(NA, 
-8L))



edgelist<-list(structure(list(nominator1 = structure(c(3L, 4L, 1L, 2L), .Label = c("Angela", 
"Jeff", "Jim", "Pam"), class = "factor"), nominee1 = structure(c(1L, 
2L, 3L, 2L), .Label = c("Andy", "Angela", "Jeff"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L)), structure(list(nominator2 = structure(c(4L, 1L, 2L, 3L
), .Label = c("Eric", "Jamie", "Oscar", "Tim"), class = "factor"), 
    nominee2 = structure(c(1L, 3L, 2L, 3L), .Label = c("Eric", 
    "Oscar", "Tim"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L)))

graph_list<- lapply(edgelist, graph_from_data_frame)

Gregor Thomas · Accepted Answer

Since you need to use graph_list[[i]] multiple times in your call, to use lapply you need to write a custom function, such as this anonymous function. (It's the same code as your loop, I just wrapped it in function(x) and replaced all instances of graph_list[[i]] with x.)

graph_list = lapply(graph_list, function(x)
  set_vertex_attr(x, "gender", value = attribute_df$gender[match(V(x)$name, attribute_df$names)])
)

(Note that I didn't test this, but it should work unless I made a typo.)

lapply isn't vectorization---it's just "loop hiding". In this case, I think your for loop is a nicer way to do things than lapply. Especially since you are modifying existing objects, your simple for loop will probably be more efficient than an lapply solution, as well as more readable.

When we talk about vectorization for efficiency, we almost always mean atomic vectors, not lists. (It's vectorization, after all, not listization.) The reason to use lapply and related functions (sapply, vapply, Map, most of the purrr package) isn't computer efficiency, it's readability, and human-efficiency to write.

Let's say you have a list of data frames, my_list = list(iris, mtcars, CO2). If you want to get the number of rows for each of the data frames in the list and store it in a variable, we could use sapply or a for loop:

# easy to write, easy to read
rows_apply = sapply(my_list, nrow)

# annoying to read and write
rows_for = integer(length(my_list))
for (i in seq_along(my_list)) rows_for[i] = nrow(my_list[[i]])

But the more complex your task gets, the more readable a for loop becomes compared to an alternative like these. In your case, I'd prefer the for loop.

For more reading on this, see the old question Is apply more than syntactic sugar?. Since those answers were written, R has been upgraded to include a just-in-time compiler, which further speeds up for loops relative to apply. In the nearly 10-year-old answers there, you'll see that sometimes *apply is slightly faster than a for loop. Since the JIT compiler, I think you'll find the opposite: most of the time a for loop is slightly faster than *apply.

But in both of those cases, unless you're doing something absolutely trivial inside the for/apply, whatever you do inside for/apply will dominate the timings.

Using apply family and multiple functions on lists in R

Answers (1)

Related Questions