Reputation: 31
I´m not sure if this is possible or even how to get a good resolution for the following R problem.
Data / Background / Structure: I´ve collected a big dataset of project based cooperation data, which maps specific projects to the participating companies (this can be understood as a bipartite edgelist for social network analysis). Because of analytical reasons it is advised to subset the whole dataset to different subsets of different locations and time periods. Therefore, I´ve created the following data structure
sna.location.list
[[1]] (location1)
[[1]] (is a dataframe containing the bip. edge-list for time-period1)
[[2]] (is a dataframe containing the bip. edge-list for time-period2)
...
[[20]] (is a dataframe containing the bip. edge-list for time-period20)
[[2]] (location2)
... (same as 1)
...
[[32]] (location32)
...
Every dataframe contains a project id and the corresponding company ids.
My goal is now to transform the bipartite edgelists to one-mode networks and then do some further sna-related-calculations (degree, centralization, status, community detection etc.) and save them.
I know how to these claculation-steps with one(!) specific network but it gives me a really hard time to automate this process for all of the networks at one time in the described list structure, and save the various outputs (node-level and network-level variables) in a similar structure.
I already tried to look up several ways of for-loops and apply approaches but it still gives me sleepless nights how to do this and right now I feel very helpless. Any help or suggestions would be highly appreciated. If you need more information or examples to give me a brief demo or code example how to tackle such a nested structure and do such sna-related calculations/modification for all of the aforementioned subsets in an efficient automatic way, please feel free to contact me.
Upvotes: 0
Views: 277
Reputation: 146070
Let's say you have a function foo
that you want to apply to each data frame. Those data frames are in lists, so lapply(that_list, foo)
is what we want. But you've got a bunch of lists, so we actually want to lapply
that first lapply
across the outer list, hence lapply(that_list, lapply, foo)
. (The foo
will be passed along to the inner lapply
with ...
. If you wish to be more explicit you can use an anonymous function instead: lapply(that_list, function(x) lapply(x, foo))
.
You haven't given a reproducible example, so I'll demonstrate applying the nrow
function to a list of built-in data frames
d = list(
list(mtcars, iris),
list(airquality, faithful)
)
result = lapply(d, lapply, nrow)
result
# [[1]]
# [[1]][[1]]
# [1] 32
#
# [[1]][[2]]
# [1] 150
#
#
# [[2]]
# [[2]][[1]]
# [1] 153
#
# [[2]][[2]]
# [1] 272
As you can see, the output is a list
with the same structure. If you need the names, you can switch to sapply
with simplify = FALSE
.
This covers applying functions to a nested list and saving the returns in a similar data structure. If you need help with calculation efficiency, parallelization, etc., I'd suggest asking a separate question focused on that, with a reproducible example.
Upvotes: 1