Reputation: 3321
I would like to compile an Excel file with multiple tabs labeled by year (2016, 2015, 2014, etc). Each tab has identical data, but column names may be spelled differently from year-to-year.
I would like to standardize columns in each sheet before combining.
This is the generic way of combining using purrr
and readxl
for such tasks:
combined.df <- excel_sheets(my.file) %>%
set_names() %>%
map_dfr(read_excel, path = my.file, .id = "sheet")
...however as noted, this creates separate columns for "COLUMN ONE", and "Column One", which have the same data.
Inserting make.names
into the pipeline would probably be the best solution.
Keeping it all together would be ideal...something like:
combined.df <- excel_sheets(my.file) %>%
set_names() %>%
map(read_excel, path = my.file) %>%
map(~(names(.) %>% #<---WRONG
make.names() %>%
str_to_upper() %>%
str_trim() %>%
set_names()) )
..but the syntax is all wrong.
Upvotes: 0
Views: 485
Reputation: 511
Rather than defining your own function, the clean_names
function from the janitor package may be able to help you. It takes a dataframe/tibble as an input and returns a dataframe/tibble with clean names as an output.
Here's an example:
library(tidyverse)
tibble(" a col name" = 1,
"another-col-NAME" = 2,
"yet another name " = 3) %>%
janitor::clean_names()
#> # A tibble: 1 x 3
#> a_col_name another_col_name yet_another_name
#> <dbl> <dbl> <dbl>
#> 1 1 2 3
You can then plop it right into the code you gave:
combined.df <- excel_sheets(my.file) %>%
set_names() %>%
map(read_excel, path = my.file) %>% #<Import as list, not dfr
map(janitor::clean_names) %>% #<janitor::clean_names
bind_rows(.id = "sheet")
Upvotes: 1
Reputation: 3321
Creating a new function is doable but is verbose and uses two maps:
# User defined function: col_rename
col_rename <- function(df){
names(df) <- names(df) %>%
str_to_upper() %>%
make.names() %>%
str_trim()
return(df)
}
combined.df <- excel_sheets(my.file) %>%
set_names() %>%
map(read_excel, path = my.file) %>% #<Import as list, not dfr
map(col_rename) %>% #<Fix colnames (user defined function)
bind_rows(.id = "sheet")
Upvotes: 0