Reputation: 697
I am starting to use readr
to import CSV files with read_csv
...how do I deal with CSV files containing spaces in the header names?
read_csv
imports them with the spaces (and special characters) which prevents me from going straight to mutate
and other dplyr
functions.
How do I handle this?
Thanks!
Upvotes: 1
Views: 4123
Reputation: 643
Another approach is to use the janitor::clean_names()
function. It provides a variety of ways to transform column names without spaces. The default is snake_case
.
Upvotes: 0
Reputation: 922
You could use make.names
after you read in the data.
df <- data.frame(x=NA)
colnames(df) <- c("This col name has spaces")
colnames(df) <- make.names(colnames(df), unique=TRUE)
It will return column names with periods rather than spaces as separators.
colnames(df)
[1] "This.col.name.has.spaces"
According to the help page make.names
takes a character vector and returns a:
A syntactically valid name consisting of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number
EDIT: Including an example with special characters.
df <- data.frame(x=NA)
colnames(df) <- c("Higher than 80(°F)")
colnames(df) <- make.names(colnames(df), unique=TRUE)
colnames(df)
[1] "Higher.than.80..F."
As you can see make.names
takes 'illegal' characters and replaces them with periods, to prevent any syntax errors/issues when calling an object name directly.
If you want to remove repeating .
's then add-
colnames(df) <- gsub('(\\.)\\1+', '\\1', colnames(df))
colnames(df)
[1] "Higher.than.80.F."
Upvotes: 2
Reputation: 1557
When I import a csv containing spaces in the headers I can actually access them as usual with the dollar operator. Lets say I have a data.frame (df) like this:
a a b b
1 1 1
2 1 2
Where "a a" ist the name of the first column and "b b" the name of the second, I can get the first column with
df$`a a`
But if you want to change them anyways you can just rename them like this:
names(df) <- c("a_a", "b_b")
The vector you're assigning just needs to have the same length as the columns of the data.frame. A slightly more elegant way would be the use of the stringr package. If you want to replace all spaces with underscores just type this:
library(stringr)
names(df) <- str_replace_all(names(df), " ", "_")
Upvotes: 2