Nettle
Nettle

Reputation: 3321

Standardize column names in excel sheets before combining with purrr and readxl

I would like to compile an Excel file with multiple tabs labeled by year (2016, 2015, 2014, etc). Each tab has identical data, but column names may be spelled differently from year-to-year.

I would like to standardize columns in each sheet before combining.

This is the generic way of combining using purrr and readxl for such tasks:

combined.df <- excel_sheets(my.file) %>% 
  set_names() %>%                                 
  map_dfr(read_excel, path = my.file, .id = "sheet") 

...however as noted, this creates separate columns for "COLUMN ONE", and "Column One", which have the same data.

Inserting make.names into the pipeline would probably be the best solution.

Keeping it all together would be ideal...something like:

   combined.df <- excel_sheets(my.file) %>% 
    set_names() %>% 
    map(read_excel, path = my.file) %>% 
    map(~(names(.) %>%  #<---WRONG
            make.names() %>% 
            str_to_upper() %>% 
            str_trim() %>% 
            set_names()) ) 

..but the syntax is all wrong.

Upvotes: 0

Views: 485

Answers (2)

Mirabilis
Mirabilis

Reputation: 511

Rather than defining your own function, the clean_names function from the janitor package may be able to help you. It takes a dataframe/tibble as an input and returns a dataframe/tibble with clean names as an output.

Here's an example:

library(tidyverse)

tibble(" a col name" = 1,
       "another-col-NAME" = 2,
       "yet another name  " = 3) %>% 
    janitor::clean_names()
#> # A tibble: 1 x 3
#>   a_col_name another_col_name yet_another_name
#>        <dbl>            <dbl>            <dbl>
#> 1          1                2                3

You can then plop it right into the code you gave:

combined.df <- excel_sheets(my.file) %>% 
    set_names() %>%
    map(read_excel, path = my.file) %>%  #<Import as list, not dfr
    map(janitor::clean_names) %>%        #<janitor::clean_names
    bind_rows(.id = "sheet")

Upvotes: 1

Nettle
Nettle

Reputation: 3321

Creating a new function is doable but is verbose and uses two maps:

  # User defined function: col_rename
  col_rename <- function(df){
    names(df) <- names(df) %>% 
     str_to_upper() %>% 
     make.names() %>% 
     str_trim()
   return(df)
  }

   combined.df <- excel_sheets(my.file) %>% 
    set_names() %>%
    map(read_excel, path = my.file) %>%  #<Import as list, not dfr
    map(col_rename) %>%                  #<Fix colnames (user defined function)
    bind_rows(.id = "sheet")

Upvotes: 0

Related Questions