Reputation: 141
The column names in my .xlsx Excel sheet have spaces. How can I automatically replace the spaces with "_" or "."? I would like to use read_excel
because I need to specify a range in my excel sheet.
Upvotes: 2
Views: 5102
Reputation: 144
The easiest one-liner is to add this to your read_excel
call:
, .name_repair = function(col){ gsub(" ", "_", col) }
As in:
data <- read_excel(path = excel_file,
.name_repair = function(col){ gsub(" ", "_", col) } )
In this way .name_repair
applies on the fly a simple function that uses gsub
to replace in each col
the space (" "
) with an underscore ("_"
).
Upvotes: 0
Reputation: 5138
Here is a way to do it using the .name_repair
argument with read_excel()
:
Creating the excel file for import:
# Example setup
mtcars = datasets::mtcars
names(mtcars) = paste(names(mtcars), LETTERS[1:length(mtcars)])
head(mtcars)
mpg A cyl B disp C hp D drat E wt F qsec G vs H am I gear J carb K
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
temp = tempfile(fileext = ".xlsx")
writexl::write_xlsx(mtcars, temp)
Reading in the data several different ways:
# With using the default for .name_repair ("unique" for read_excel())
head(readxl::read_excel(temp))
# A tibble: 6 x 11
`mpg A` `cyl B` `disp C` `hp D` `drat E` `wt F` `qsec G` `vs H` `am I` `gear J` `carb K`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
# Adding periods using using .name_repair
head(readxl::read_excel(temp, .name_repair = "universal"))
# A tibble: 6 x 11
mpg.A cyl.B disp.C hp.D drat.E wt.F qsec.G vs.H am.I gear.J carb.K
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
# Using a custom function to add underscores
head(readxl::read_excel(temp, .name_repair = function(x) gsub("\\s+", "_", x)))
# A tibble: 6 x 11
mpg_A cyl_B disp_C hp_D drat_E wt_F qsec_G vs_H am_I gear_J carb_K
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
# file.remove(temp)
Upvotes: 5
Reputation: 468
Using this .xlsx file:
I used the library openxlsx
. Using the read.xlsx
function I got the following dataframe:
a.long.col.name yet.another.second.column
1 a 1
2 b 2
3 c 3
4 d 4
read.xlsx
also contains options to select rows and columns from the excel sheet:
read.xlsx('C:/Users/<username>/Documents/so_query.xlsx',
rows = 2:5,
cols = 2)
Gives the following dataframe:
1
1 2
2 3
3 4
Upvotes: 0
Reputation: 4993
There is not way to tell it to do that in the method read_excel()
however you can do two things to circumvent the problem.
#create list of column names as you wish them to appear manually
names_vector<- c("name-1", "name_2"....)
#set the col_names argument equal to the list of names
data<-read_excel( <file_name>, col_names =names_vector)
These two steps will replace the header in your XLSX file with the corrected names.
Upvotes: 0