John-Henry
John-Henry

Reputation: 1817

Using `group_split` while maintaining data structure?

I am using the osfr package and I'd like to apply the function osf_ls_nodes to every row in my data.

library(osfr)
library(dplyr)
library(purrr)

project_data = 
  osfr::osf_retrieve_user("qs9fc") %>% 
  osf_ls_nodes() 


project_data %>% ## only applies to first element
  osf_ls_files() 
#> Warning: This is not a vectorized function. Only the first row of 10 will be
#> used.
#> # A tibble: 2 x 3
#>   name                             id                       meta            
#>   <chr>                            <chr>                    <list>          
#> 1 pre_analysis_plan.html           5e74e7554a60a50569bb5299 <named list [3]>
#> 2 pre_analysis_plan_amendment.docx 5ee79fa0af2627000d4f641c <named list [3]>

when I attempt to use a typical vectorization pattern using the tidyverse the structure of the data changes and this does not work also.

### When I attempt to vectorize the format changes
out <- 
  project_data %>% 
  mutate(n = row_number()) %>% 
  group_split(.keep = F) %>% 
  map(osf_ls_files)
#> Error in UseMethod("osf_ls_files"): no applicable method for 'osf_ls_files' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"

I can see an obvious way to do this with a loop but I was wondering if it was possible to use group_split or comparable function while maintaining the data structure.

Upvotes: 2

Views: 229

Answers (1)

akrun
akrun

Reputation: 887571

We can use a for loop

library(dplyr)
out <- vector('list', nrow(project_data))
for(i in seq_along(out)) {
    out[[i]] <- project_data %>% 
                   slice(i) %>%
                   osf_ls_files()
    }
out1 <- bind_rows(out)
out1
# A tibble: 31 x 3
#   name                             id                       meta            
#   <chr>                            <chr>                    <list>          
# 1 pre_analysis_plan.html           5e74e7554a60a50569bb5299 <named list [3]>
# 2 pre_analysis_plan_amendment.docx 5ee79fa0af2627000d4f641c <named list [3]>
# 3 Analyses                         5d85386ab3103d00185130b1 <named list [3]>
# 4 Appendix                         5d8539fc51eeee0019bb9417 <named list [3]>
# 5 Program_2017.pdf                 5eb4611e9ddd2800b6091b74 <named list [3]>
# 6 Materials shared by speakers     5eb4612d9ddd2800b70935cb <named list [3]>
# 7 Program_2018.pdf                 5eb461f3877c5e00bf3a35b2 <named list [3]>
# 8 Program_2019.pdf                 5eb46210877c5e00b93a46cb <named list [3]>
# 9 Program_2020.pdf                 5eb46222877c5e00c03a4fb6 <named list [3]>
#10 Sample Implementations           5626656b8c5e4a103c6121dd <named list [3]>
# … with 21 more rows

Or using split and map

library(purrr)
project_data %>%
     split(seq_len(nrow(.))) %>%
      map_dfr(osf_ls_files)
# A tibble: 31 x 3
#   name                             id                       meta            
#   <chr>                            <chr>                    <list>          
# 1 pre_analysis_plan.html           5e74e7554a60a50569bb5299 <named list [3]>
# 2 pre_analysis_plan_amendment.docx 5ee79fa0af2627000d4f641c <named list [3]>
# 3 Analyses                         5d85386ab3103d00185130b1 <named list [3]>
# 4 Appendix                         5d8539fc51eeee0019bb9417 <named list [3]>
# 5 Program_2017.pdf                 5eb4611e9ddd2800b6091b74 <named list [3]>
# 6 Materials shared by speakers     5eb4612d9ddd2800b70935cb <named list [3]>
# 7 Program_2018.pdf                 5eb461f3877c5e00bf3a35b2 <named list [3]>
# 8 Program_2019.pdf                 5eb46210877c5e00b93a46cb <named list [3]>
# 9 Program_2020.pdf                 5eb46222877c5e00c03a4fb6 <named list [3]>
#10 Sample Implementations           5626656b8c5e4a103c6121dd <named list [3]>
# … with 21 more rows

Upvotes: 1

Related Questions