Mr.Morgan
Mr.Morgan

Reputation: 31

How to use List of List of Dataframes

I´m not sure if this is possible or even how to get a good resolution for the following R problem.

Data / Background / Structure: I´ve collected a big dataset of project based cooperation data, which maps specific projects to the participating companies (this can be understood as a bipartite edgelist for social network analysis). Because of analytical reasons it is advised to subset the whole dataset to different subsets of different locations and time periods. Therefore, I´ve created the following data structure

sna.location.list 
[[1]]           (location1)
     [[1]]      (is a dataframe containing the bip. edge-list for time-period1)
     [[2]]      (is a dataframe containing the bip. edge-list for time-period2)
     ...
     [[20]]     (is a dataframe containing the bip. edge-list for time-period20)
[[2]]           (location2)
     ...         (same as 1)
 ...
[[32]]          (location32)
     ...

Every dataframe contains a project id and the corresponding company ids.

My goal is now to transform the bipartite edgelists to one-mode networks and then do some further sna-related-calculations (degree, centralization, status, community detection etc.) and save them.

I know how to these claculation-steps with one(!) specific network but it gives me a really hard time to automate this process for all of the networks at one time in the described list structure, and save the various outputs (node-level and network-level variables) in a similar structure.

I already tried to look up several ways of for-loops and apply approaches but it still gives me sleepless nights how to do this and right now I feel very helpless. Any help or suggestions would be highly appreciated. If you need more information or examples to give me a brief demo or code example how to tackle such a nested structure and do such sna-related calculations/modification for all of the aforementioned subsets in an efficient automatic way, please feel free to contact me.

Upvotes: 0

Views: 277

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146070

Let's say you have a function foo that you want to apply to each data frame. Those data frames are in lists, so lapply(that_list, foo) is what we want. But you've got a bunch of lists, so we actually want to lapply that first lapply across the outer list, hence lapply(that_list, lapply, foo). (The foo will be passed along to the inner lapply with .... If you wish to be more explicit you can use an anonymous function instead: lapply(that_list, function(x) lapply(x, foo)).

You haven't given a reproducible example, so I'll demonstrate applying the nrow function to a list of built-in data frames

d = list(
  list(mtcars, iris),
  list(airquality, faithful)
)

result = lapply(d, lapply, nrow)
result
# [[1]]
# [[1]][[1]]
# [1] 32
# 
# [[1]][[2]]
# [1] 150
# 
# 
# [[2]]
# [[2]][[1]]
# [1] 153
# 
# [[2]][[2]]
# [1] 272

As you can see, the output is a list with the same structure. If you need the names, you can switch to sapply with simplify = FALSE.

This covers applying functions to a nested list and saving the returns in a similar data structure. If you need help with calculation efficiency, parallelization, etc., I'd suggest asking a separate question focused on that, with a reproducible example.

Upvotes: 1

Related Questions