mpvalenc
mpvalenc

Reputation: 61

Joining data in lists of list

I'm importing data from multiple excel files using the readxl package and I made a function in my script so that I only import specific sheets that I need

read_excel_sheets <- function(excelDoc) {
     sheets <- readxl::excel_sheets(excelDoc)
     sheets <- sheets[4:6]
     x <- lapply(sheets, function(X) readxl::read_excel(excelDoc, sheet = X))
     return(x)
}
#load files in folder
rawfiles <- list.files()
IMPORT <- lapply(rawfiles, FUN = read_excel_sheets)

After loading the files in my folder into my script, IMPORT becomes a list[10] that contains list[3] inside of it, basically lists inside of a list.

Unfortunately, I can't use reduce(full_join) to gather my data into one data table. I've tried working with just one excel file and using unlist() to see if I can get my sheets out of the lists of list but that did not work.

Test <- read_excel_sheets("Hop_L_Trial1.xlsx")
Test_Test <- unlist(Test)

I've also tried

rawfiles <- list.files()
IMPORT <- lapply(rawfiles,
                FUN = read_excel_sheets) 
Test_3 <- rbindlist(IMPORT) 

and received an error "Column 1 of item 1 is length 2 inconsistent with column 2 which is length 6. Only length-1 columns are recycled." Any suggestions on how to join my data into one data table would be greatly appreciated, thank you.

Upvotes: 2

Views: 94

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389155

You can use map_df from purrr to get data as a single dataframe.

read_excel_sheets <- function(excelDoc) {
  sheets <- readxl::excel_sheets(excelDoc)
  sheets <- sheets[4:6]
  x <- purrr::map_df(sheets, function(X) readxl::read_excel(excelDoc, sheet = X))
  return(x)
}

IMPORT <- purrr::map_df(rawfiles, FUN = read_excel_sheets)

You can also use do.call + rbind base R functions.

read_excel_sheets <- function(excelDoc) {
  sheets <- readxl::excel_sheets(excelDoc)
  sheets <- sheets[4:6]
  x <- do.call(rbind, lapply(sheets, function(X) readxl::read_excel(excelDoc, sheet = X)))
  return(x)
}

IMPORT <- do.call(rbind, lapply(rawfiles, FUN = read_excel_sheets))

Upvotes: 1

patalt
patalt

Reputation: 475

For that data.table::rbindlist approach just set fill=T:

library(data.table)

dt_list = lapply(5:10, function(i) {
  data.table(rnorm(i))
})
dt=rbindlist(dt_list, fill=T) 

Upvotes: 1

Limey
Limey

Reputation: 12506

You can use bind_rows from the dplyr package, part of the tidyverse:

x <- list(tibble(x=runif(5)),  tibble(x=runif(5)))
y <- list(tibble(x=runif(5)),  tibble(x=runif(5)))
z = list(x, y)

ans <- bind_rows(z)
ans
A tibble: 20 x 1
        x
    <dbl>
 1 0.746 
 2 0.0669
 3 0.612 
 4 0.0702
 5 0.990 
 6 0.301 
 7 0.177 
 8 0.799 
 9 0.242 
10 0.741 
11 0.651 
12 0.113 
13 0.927 
14 0.506 
15 0.477 
16 0.922 
17 0.217 
18 0.566 
19 0.539 
20 0.327 

Use (say) bind_rows(..., .id="Sheet") if you need to track which element of the list each row came from originally.

Upvotes: 0

Related Questions