How to write efficient code for importing multiple SAS files

Question

I am struggling with creating efficient code for importing SAS data file.

My code is the follow:

library(foreign)
library(haven)
f <- file.path(path = "E:/Cohortdata/Raw cohort/Nationalscreeningcohort/01.jk", 
               c("nhis_heals_jk_2002.sas7bdat","nhis_heals_jk_2003.sas7bdat" ,"nhis_heals_jk_2004.sas7bdat",
                 "nhis_heals_jk_2005.sas7bdat","nhis_heals_jk_2006.sas7bdat","nhis_heals_jk_2007.sas7bdat",
                 "nhis_heals_jk_2008.sas7bdat","nhis_heals_jk_2009.sas7bdat","nhis_heals_jk_2010.sas7bdat",       "nhis_heals_jk_2011.sas7bdat","nhis_heals_jk_2012.sas7bdat","nhis_heals_jk_2013.sas7bdat"))
d <- lapply (f, read_sas)

I know rewriting it with for loop would be much more efficient, but don't know how the code should be look like

I would be very thankful if you help me.

DJV · Accepted Answer

It's a variation of a code that I posted here but you can use it for SAS files too.

Please note that instead of using file.path() I used list.files(). That allowed me to read all the files in the path "E:/Cohortdata/Raw cohort/Nationalscreeningcohort", which is where I assumed your files are. In addition, I used the argument pattern to look only for sas7bdat files.

list.files() returns a vector, here you can use your *apply method that you'd like. However, I like changing the vector to tbl_df and to use the the tidyverse approach. Which means reading all the files using purrr::map() (part of tidyverse) and create a big data tbl_df of all of the files.

library(tidyverse)
library(foreign)
library(haven)

df <- list.files(path = "E:/Cohortdata/Raw cohort/Nationalscreeningcohort",
                 full.names = TRUE,
                 recursive = TRUE,
                 pattern = "*.sas7bdat") %>% 
  tbl_df() %>%
  mutate(data = map(value, read_sas)) %>%
  unnest(data)

How to write efficient code for importing multiple SAS files

Answers (1)

Related Questions