Alina HE
Alina HE

Reputation: 31

load multiple tables in one spreadsheet into R

I have an Excel worksheet containing multiple tables that I want to import into R (import as multiple data frame). I know how to do that if these tables are stored in different tabs, but the raw file that I'm using has several tables in one tab.

Part view of my spreadsheet

I'm wondering if it is possible to have R read each table into a separate data frame by a certain function. Say once R reads a blank row, it just stops and imports all above rows into one table, then restart the process to read the second table, for example. Or do I have to manually save each table into a separate tab/sheet to let R read them?

Note: I cannot use nrow=n in the read.table package as the row of each table might be changing. I need to find a more universal way to make the whole process automatic, regardless of the structure of each table.

Upvotes: 2

Views: 3875

Answers (2)

crazybilly
crazybilly

Reputation: 3092

If you reliably have blank rows delimiting your sheets, you might try reading each tab in as a big table, then splitting it that way.

library(tidyverse)
library(readxl)

# read the whole thing into a single file
wholeworksheet  <- read_excel('myfile.xlsx')

# find the blank rows
blankrows  <- data_frame(
    blanks = which(is.na(wholeworksheet[1]))
  ) %>% 
  mutate(
      dif = blanks - lag(blanks)
    , rownum = row_number()

    # maybe someone can suggest a better way to handle using dplyr::lag() 
    , startrow = ifelse(rownum == 1, 1, NA)
    , startrow = coalesce(ifelse(dif == 1, lag(startrow, default =1), lag(blanks + 1)), 1)
  )

# get the end rows of each table
endrows  <- blankrows %>% 
  group_by(startrow) %>% 
  summarize(
    endrow = min(blanks)
  )

# combine start and end rows into a single table
tableindex <- blankrows %>% 
  left_join(endrows, by = 'startrow') %>% 
  distinct(startrow, endrow)

# the last blank row is probably just before the last table in the sheet
if(nrow(wholeworksheet) > max(blankrows$blanks)) {

  lasttable  <- data_frame(startrow = max(blankrows$blanks) + 1, endrow = nrow(wholeworksheet))
  tableindex  <- tableindex %>% 
    bind_rows(lasttable)
}

# split your tables up into a list of tables 
alistoftables  <- map(1:nrow(tableindex), ~ wholeworksheet[tableindex$startrow[.x]:tableindex$endrow[.x] , ]  )

Upvotes: 3

struggles
struggles

Reputation: 865

Could you manually copy/paste each table into it's own tab and then export each tab into a CSV or TSV? That is what I would normally do.

If that is too much work, then @joran's suggestion will probably work. You could use stringr::str_detect() to identify when a column or row is empty or not empty and then plug that information into the startRow, startCol, endRow, endCol parameters.

Upvotes: 0

Related Questions