nohomejerome
nohomejerome

Reputation: 141

Remove automatically all spaces from column names using read_excel

The column names in my .xlsx Excel sheet have spaces. How can I automatically replace the spaces with "_" or "."? I would like to use read_excel because I need to specify a range in my excel sheet.

Upvotes: 2

Views: 5102

Answers (4)

Ni-Ar
Ni-Ar

Reputation: 144

The easiest one-liner is to add this to your read_excel call:

, .name_repair = function(col){ gsub(" ", "_", col) }

As in:


data <- read_excel(path = excel_file,
                  .name_repair = function(col){ gsub(" ", "_", col) } )

In this way .name_repair applies on the fly a simple function that uses gsub to replace in each col the space (" ") with an underscore ("_").

Upvotes: 0

Andrew
Andrew

Reputation: 5138

Here is a way to do it using the .name_repair argument with read_excel():

Creating the excel file for import:

# Example setup
mtcars = datasets::mtcars
names(mtcars) = paste(names(mtcars), LETTERS[1:length(mtcars)])
head(mtcars)
                  mpg A cyl B disp C hp D drat E  wt F qsec G vs H am I gear J carb K
Mazda RX4          21.0     6    160  110   3.90 2.620  16.46    0    1      4      4
Mazda RX4 Wag      21.0     6    160  110   3.90 2.875  17.02    0    1      4      4
Datsun 710         22.8     4    108   93   3.85 2.320  18.61    1    1      4      1
Hornet 4 Drive     21.4     6    258  110   3.08 3.215  19.44    1    0      3      1
Hornet Sportabout  18.7     8    360  175   3.15 3.440  17.02    0    0      3      2
Valiant            18.1     6    225  105   2.76 3.460  20.22    1    0      3      1

temp = tempfile(fileext = ".xlsx")
writexl::write_xlsx(mtcars, temp)

Reading in the data several different ways:

# With using the default for .name_repair ("unique" for read_excel())
head(readxl::read_excel(temp))
# A tibble: 6 x 11
  `mpg A` `cyl B` `disp C` `hp D` `drat E` `wt F` `qsec G` `vs H` `am I` `gear J` `carb K`
    <dbl>   <dbl>    <dbl>  <dbl>    <dbl>  <dbl>    <dbl>  <dbl>  <dbl>    <dbl>    <dbl>
1    21         6      160    110     3.9    2.62     16.5      0      1        4        4
2    21         6      160    110     3.9    2.88     17.0      0      1        4        4
3    22.8       4      108     93     3.85   2.32     18.6      1      1        4        1
4    21.4       6      258    110     3.08   3.22     19.4      1      0        3        1
5    18.7       8      360    175     3.15   3.44     17.0      0      0        3        2
6    18.1       6      225    105     2.76   3.46     20.2      1      0        3        1


# Adding periods using using .name_repair
head(readxl::read_excel(temp, .name_repair = "universal"))
# A tibble: 6 x 11
  mpg.A cyl.B disp.C  hp.D drat.E  wt.F qsec.G  vs.H  am.I gear.J carb.K
  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl>
1  21       6    160   110   3.9   2.62   16.5     0     1      4      4
2  21       6    160   110   3.9   2.88   17.0     0     1      4      4
3  22.8     4    108    93   3.85  2.32   18.6     1     1      4      1
4  21.4     6    258   110   3.08  3.22   19.4     1     0      3      1
5  18.7     8    360   175   3.15  3.44   17.0     0     0      3      2
6  18.1     6    225   105   2.76  3.46   20.2     1     0      3      1

# Using a custom function to add underscores
head(readxl::read_excel(temp, .name_repair = function(x) gsub("\\s+", "_", x)))
# A tibble: 6 x 11
  mpg_A cyl_B disp_C  hp_D drat_E  wt_F qsec_G  vs_H  am_I gear_J carb_K
  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl>
1  21       6    160   110   3.9   2.62   16.5     0     1      4      4
2  21       6    160   110   3.9   2.88   17.0     0     1      4      4
3  22.8     4    108    93   3.85  2.32   18.6     1     1      4      1
4  21.4     6    258   110   3.08  3.22   19.4     1     0      3      1
5  18.7     8    360   175   3.15  3.44   17.0     0     0      3      2
6  18.1     6    225   105   2.76  3.46   20.2     1     0      3      1

# file.remove(temp)

Upvotes: 5

Matt
Matt

Reputation: 468

Using this .xlsx file:

enter image description here

I used the library openxlsx. Using the read.xlsx function I got the following dataframe:

     a.long.col.name yet.another.second.column
1               a                         1
2               b                         2
3               c                         3
4               d                         4

read.xlsx also contains options to select rows and columns from the excel sheet:

read.xlsx('C:/Users/<username>/Documents/so_query.xlsx',
      rows  = 2:5,
      cols = 2)

Gives the following dataframe:

  1
1 2
2 3
3 4

Upvotes: 0

sconfluentus
sconfluentus

Reputation: 4993

There is not way to tell it to do that in the method read_excel() however you can do two things to circumvent the problem.

#create list of column names as you wish them to appear manually
names_vector<- c("name-1", "name_2"....)

#set the col_names argument equal to the list of names
data<-read_excel( <file_name>, col_names =names_vector)

These two steps will replace the header in your XLSX file with the corrected names.

Upvotes: 0

Related Questions