MelkorNO
MelkorNO

Reputation: 3

Importing Excel-tables in R

Is there a way to import a named Excel-table into R as a data.frame?

I typically have several named Excel-tables on a single worksheet, that I want to import as data.frames, without relying on static row - and column references for the location of the Excel-tables.

I have tried to set namedRegion which is an available argument for several Excel-import functions, but that does not seem to work for named Excel-tables. I am currently using the openxlxs package, which has a function getTables() that creates a variable with Excel-table names from a single worksheet, but not the data in the tables.

Upvotes: 0

Views: 2699

Answers (5)

ovancantfort
ovancantfort

Reputation: 11

I would like to add something to this old thread. The library openxlsx2 allows to access the structure of a workbook by storing it in a workbook object.

library(openxlsx2)
wb <- wb_load("Excel_file.xlsx")

You can then access the list of tables. There is no direct function for this but is can be obtained this way.

 named_tables <- wb[["tables"]][["tab_name"]]

You can then read the named tables by using the wb_to_df function with the named_region argument. If no sheet argument is supplied, the first region matching the name will be supplied. As tables have unique names at workbook level, no need to determine the sheet.

list_df <- list()
for (table_name in named_tables) {
    table_data <- wb_to_df(wb, named_region = table_name)
    list_df <- append(list_df,list(table_data))
}

This works well, even for huge workbooks (tried it with a 160mb workbook with 3 tables of 1 million rows, most other packages would crash, but not this one), but is not very fast compared to other data imports. I actually used it to turn these huge workbooks to Parquet for faster data treatment.

Hope it can be useful info for others.

Upvotes: 1

AlexDinahl
AlexDinahl

Reputation: 31

This link is might be useful for you

https://stackoverflow.com/a/17709204/10235327

1. Install XLConnect package
2. Save a path to your file in a variable
3. Load workbook
4. Save your data to df

To get table names you can use function

getTables(wb,sheet=1,simplify=T)

Where:

  • wb - your workbook
  • sheet - sheet name or might be the number as well
  • simplify = TRUE (default) the result is simplified to a vector

https://rdrr.io/cran/XLConnect/man/getTables-methods.html

Here's the code (not mine, copied from the topic above, just a bit modified)

require(XLConnect)
sampleFile = "C:/Users/your.name/Documents/test.xlsx"
wb = loadWorkbook(sampleFile)
myTable <- getTables(wb,sheet=1)
df<-readTable(wb, sheet = 1, table = myTable)

Upvotes: 1

ricoderks
ricoderks

Reputation: 1619

To get your named table is a little bit of work.

First you need to load the workbook.

library(openxlsx)

wb <- loadWorkbook("name_excel_file.xlsx")

Next you need to extract the name of your named table.

# get the name and the range
tables <- getTables(wb = wb,
                    sheet = 1)

If you have multiple named tables they are all in tables. My named table is called Table1.

Next you to extract the column numbers and row numbers, which you will later use to extract the named table from the Excel file.

# get the range
table_range <- names(tables[tables == "Table1"])
table_range_refs <- strsplit(table_range, ":")[[1]]

# use a regex to extract out the row numbers
table_range_row_num <- gsub("[^0-9.]", "", table_range_refs)
# extract out the column numbers
table_range_col_num <- convertFromExcelRef(table_range_refs)

Now you re-read the Excel file with the cols and rows parameter.

# finally read it
my_df <- read.xlsx(xlsxFile = "name_excel_file.xlsx",
                   sheet = 1,
                   cols = table_range_col_num[1]:table_range_col_num[2],
                   rows = table_range_row_num[1]:table_range_row_num[2])

You end up with a data frame with only the content of your named table.

I used this a while ago. I found this code somewhere, but I don't know anymore from where.

Upvotes: 1

efz
efz

Reputation: 435

I use read.xlsx from package openxlsx. For example:

library(openxlsx)
fileA <- paste0(some.directory,'excel.file.xlsx')
A <-  read.xlsx(fileA, startRow = 3)

hope it helps

Upvotes: 0

Duck
Duck

Reputation: 39585

You can check next packages:

library(xlsx)
Data <- read.xlsx('YourFile.xlsx',sheet=1)

library(readxl)
Data <- read_excel('YourFile.xlsx',sheet=1)

Both options allow you to define specific regions to load the data into R.

Upvotes: 0

Related Questions