Reputation: 31
I have an Excel worksheet containing multiple tables that I want to import into R (import as multiple data frame). I know how to do that if these tables are stored in different tabs, but the raw file that I'm using has several tables in one tab.
I'm wondering if it is possible to have R read each table into a separate data frame by a certain function. Say once R reads a blank row, it just stops and imports all above rows into one table, then restart the process to read the second table, for example. Or do I have to manually save each table into a separate tab/sheet to let R read them?
Note: I cannot use nrow=n
in the read.table
package as the row of each table might be changing. I need to find a more universal way to make the whole process automatic, regardless of the structure of each table.
Upvotes: 2
Views: 3875
Reputation: 3092
If you reliably have blank rows delimiting your sheets, you might try reading each tab in as a big table, then splitting it that way.
library(tidyverse)
library(readxl)
# read the whole thing into a single file
wholeworksheet <- read_excel('myfile.xlsx')
# find the blank rows
blankrows <- data_frame(
blanks = which(is.na(wholeworksheet[1]))
) %>%
mutate(
dif = blanks - lag(blanks)
, rownum = row_number()
# maybe someone can suggest a better way to handle using dplyr::lag()
, startrow = ifelse(rownum == 1, 1, NA)
, startrow = coalesce(ifelse(dif == 1, lag(startrow, default =1), lag(blanks + 1)), 1)
)
# get the end rows of each table
endrows <- blankrows %>%
group_by(startrow) %>%
summarize(
endrow = min(blanks)
)
# combine start and end rows into a single table
tableindex <- blankrows %>%
left_join(endrows, by = 'startrow') %>%
distinct(startrow, endrow)
# the last blank row is probably just before the last table in the sheet
if(nrow(wholeworksheet) > max(blankrows$blanks)) {
lasttable <- data_frame(startrow = max(blankrows$blanks) + 1, endrow = nrow(wholeworksheet))
tableindex <- tableindex %>%
bind_rows(lasttable)
}
# split your tables up into a list of tables
alistoftables <- map(1:nrow(tableindex), ~ wholeworksheet[tableindex$startrow[.x]:tableindex$endrow[.x] , ] )
Upvotes: 3
Reputation: 865
Could you manually copy/paste
each table into it's own tab and then export each tab into a CSV or TSV? That is what I would normally do.
If that is too much work, then @joran's suggestion will probably work. You could use stringr::str_detect()
to identify when a column or row is empty or not empty and then plug that information into the startRow
, startCol
, endRow
, endCol
parameters.
Upvotes: 0