Prevost
Prevost

Reputation: 697

Import CSV file with spaces in header using read_csv from readr

I am starting to use readr to import CSV files with read_csv...how do I deal with CSV files containing spaces in the header names?

read_csv imports them with the spaces (and special characters) which prevents me from going straight to mutate and other dplyr functions.

How do I handle this?

Thanks!

Upvotes: 1

Views: 4123

Answers (3)

Pss
Pss

Reputation: 643

Another approach is to use the janitor::clean_names() function. It provides a variety of ways to transform column names without spaces. The default is snake_case.

Upvotes: 0

D.sen
D.sen

Reputation: 922

You could use make.names after you read in the data.

df <- data.frame(x=NA)
colnames(df) <- c("This col name has spaces")
colnames(df) <- make.names(colnames(df), unique=TRUE)

It will return column names with periods rather than spaces as separators.

colnames(df)
[1] "This.col.name.has.spaces"

According to the help page make.names takes a character vector and returns a:

A syntactically valid name consisting of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number

EDIT: Including an example with special characters.

df <- data.frame(x=NA)
colnames(df) <- c("Higher than 80(°F)")
colnames(df) <- make.names(colnames(df), unique=TRUE)

colnames(df)
[1] "Higher.than.80..F."

As you can see make.names takes 'illegal' characters and replaces them with periods, to prevent any syntax errors/issues when calling an object name directly.

If you want to remove repeating .'s then add-

colnames(df) <- gsub('(\\.)\\1+', '\\1', colnames(df))
colnames(df)
[1] "Higher.than.80.F."

Upvotes: 2

j3ypi
j3ypi

Reputation: 1557

When I import a csv containing spaces in the headers I can actually access them as usual with the dollar operator. Lets say I have a data.frame (df) like this:

   a a b b
 1   1   1
 2   1   2

Where "a a" ist the name of the first column and "b b" the name of the second, I can get the first column with

df$`a a`

But if you want to change them anyways you can just rename them like this:

names(df) <- c("a_a", "b_b")

The vector you're assigning just needs to have the same length as the columns of the data.frame. A slightly more elegant way would be the use of the stringr package. If you want to replace all spaces with underscores just type this:

library(stringr)    
names(df) <- str_replace_all(names(df), " ", "_")

Upvotes: 2

Related Questions