Mark Miller
Mark Miller

Reputation: 13103

Change letter case of column names

I have a large number of data sets each containing a long list of column names. In some files the column names are all capital letters and in some files only the first letter of the column names is capitalized. I need to append the data sets and thought the easiest way to match column names among data sets would be to convert the all-capital names into names with only the first letter capitalized.

I am hoping to find a general solution, maybe even a one-liner.

Here is my example data set. The desired names are included in the names statements.

my.data2 <-  "
landuse units grade CLAY    LINCOLN  BASINANDRANGE  MCCARTNEY  MAPLE
apple   acres AAA     0         2          3             4         6
apple   acres AA   1000       900         NA            NA       700
pear    acres AA   10.0        20         NA          30.0        40
peach   acres AAA   500       400        350           300       200
"
my.data2 <- read.table(textConnection(my.data2), header=TRUE)

names(my.data2)[names(my.data2)=="CLAY"]            <- "Clay"
names(my.data2)[names(my.data2)=="BASINANDRANGE"]   <- "BasinandRange"
names(my.data2)[names(my.data2)=="LINCOLN"]         <- "Lincoln"
names(my.data2)[names(my.data2)=="MCCARTNEY"]       <- "McCartney"
names(my.data2)[names(my.data2)=="MAPLE"]           <- "Maple"

my.data2

Note that I included the names McCartney and BasinandRange to make things more realistic and more difficult. However, if I can find a one-liner to deal with 95% of the names and use the above names statements to deal with complications like McCartney and BasinandRange that would be great.

I have searched the internet, including the StackOverflow archives, without finding a solution. Sorry if I overlooked one. Thank you for any help.

Upvotes: 31

Views: 108899

Answers (10)

HA LIM PARK
HA LIM PARK

Reputation: 59

This will make every column upper case.

dplyr::rename_with(names,toupper)

Upvotes: 4

Varun Tandra
Varun Tandra

Reputation: 331

Easy Solution

names(DF) <- base::toupper(names(DF))

Upvotes: 29

Jolyon
Jolyon

Reputation: 1

A recent 'tidy' solution I arrived at which renames all columns by making the first letter uppercase and doing nothing to the rest of the string.

Preliminaries:

library(tibble)  
library(dplyr)
library(stringr)

    
df <- tibble("column 1" = rep(1,10),
             "column 2 ABC" = rep("a",10),
             "Column 3 Abc" = rep("b", 10))
head(df)

  `column 1` `column 2 ABC` `Column 3 Abc`
       <dbl> <chr>          <chr>         
1          1 a              b             
2          1 a              b             
3          1 a              b             
4          1 a              b             
5          1 a              b             
6          1 a              b             

Rename columns using custom function:

df <- df %>%
    rename_with(function(x) str_replace(x, "^.", toupper))
    
head(df)

  `Column 1` `Column 2 ABC` `Column 3 Abc`
       <dbl> <chr>          <chr>         
1          1 a              b             
2          1 a              b             
3          1 a              b             
4          1 a              b             
5          1 a              b             
6          1 a              b             

The regex "^." can be understood as (based on https://regexr.com/7rp4k):

  • "^": beginning of the string
  • ".": matches any character

Upvotes: 0

linkonabe
linkonabe

Reputation: 846

data.table syntax, I believe would save more time and efficient. its also a one line statement, even shorter.

library(data.table)
setnames(my.data2, tolower(names(my.data2[4:8])))

#   landuse units grade clay lincoln basinandrange mccartney maple
#1:   apple acres   AAA    0       2             3         4     6
#2:   apple acres    AA 1000     900            NA        NA   700
#3:    pear acres    AA   10      20            NA        30    40
#4:   peach acres   AAA  500     400           350       300   200

Update 2023
Your question was to capitalise the first letter which we didn't quite address then. Here is an updated one-liner data.table approach.

setnames(my.data2, 4:8, tools::toTitleCase(names(my.data2)[4:8]))

Upvotes: 11

Will M
Will M

Reputation: 812

Combining two of the answers here, I've come up with an elegant tidy way:

This renames all column/variable names by capitalising the first letter of every word.

library(tidyverse)

my.data2 %>%
  rename_with(str_to_title)

Upvotes: 20

AlexB
AlexB

Reputation: 3269

Another option:

colnames(df) <- stringr::str_to_title(colnames(df))

Upvotes: 5

A "tidy" solution:

library(dplyr)

my.data2.mod <- my.data2 %>% 
  rename_at(c("CLAY", "LINCOLN", "BASINANDRANGE", "MCCARTNEY",  "MAPLE"),
            .funs = tolower)

names(my.data2.mod) 
# [1] "landuse"       "units"         "grade"         "clay"         
# [5] "lincoln"       "basinandrange" "mccartney"     "maple"   

Also, to answer the original question and leave some cases capitalized, you can use the snakecase package:

library(snakecase)

my.data2.mod = my.data2 %>% 
  rename_at(
    c("CLAY", "LINCOLN", "BASINANDRANGE", "MCCARTNEY",  "MAPLE"),
    .funs = list(
      ~ to_upper_camel_case(., 
                            abbreviations = c("McCartney", "BasinandRange")
                            )
      )
    )

names(my.data2.mod)
# [1] "landuse"       "units"         "grade"         "Clay"         
# [5] "Lincoln"       "BasinandRange" "McCartney"     "Maple" 

Upvotes: 8

ikashnitsky
ikashnitsky

Reputation: 3111

modern solution

This is now a job for janitor::clean_names(), just choose case parameter that fits you need.

Upvotes: 27

Mark Miller
Mark Miller

Reputation: 13103

I used Josh O'Brien's answer, but eventually wrote the code below that creates column names with the first letter in upper case and the other letters in lower case, with a few exceptions handled as in the original post. Below I used the same data set as in the original post, but read that data into R differently where n.col determines the number of columns in the data file:

n.col <- as.numeric(length(scan("c:/users/mark w miller/simple R programs/names_with_capital_letters.txt", 
         what="character", nlines=1)))

my.data2 <- read.table(file = "c:/users/mark w miller/simple R programs/names_with_capital_letters.txt", 
            na.string=NA, header = T, colClasses = c('character', 'character', 'character', 
            rep('numeric', (n.col[1] - 3))))

first.letter  <- substring(names(my.data2)[-1:-3], 1, 1)
other.letters <- tolower(substring(names(my.data2)[-1:-3], 2))
newnames      <- paste(first.letter, other.letters, sep="")

names(my.data2)[-1:-3] <- newnames
names(my.data2)[names(my.data2)=="Basinandrange"]   <- "BasinandRange"
names(my.data2)[names(my.data2)=="Mccartney"]       <- "McCartney"

my.data2

#   landuse units grade Clay Lincoln BasinandRange McCartney Maple
# 1   apple acres   AAA    0       2             3         4     6
# 2   apple acres    AA 1000     900            NA        NA   700
# 3    pear acres    AA   10      20            NA        30    40
# 4   peach acres   AAA  500     400           350       300   200

Upvotes: 3

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

Here is a one-liner implementing "the easiest way to match column names among data sets" that I can think of:

## Columns 1:3 left unaltered since they are not place names.
names(my.data2)[-1:-3] <- tolower(names(my.data2)[-1:-3])

## View the results
names(my.data2)
# [1] "landuse"       "units"         "grade"         "clay"         
# [5] "lincoln"       "basinandrange" "mccartney"     "maple"   

Upvotes: 44

Related Questions