Reputation: 13103
I have a large number of data sets each containing a long list of column names. In some files the column names are all capital letters and in some files only the first letter of the column names is capitalized. I need to append the data sets and thought the easiest way to match column names among data sets would be to convert the all-capital names into names with only the first letter capitalized.
I am hoping to find a general solution, maybe even a one-liner.
Here is my example data set. The desired names are included in the names
statements.
my.data2 <- "
landuse units grade CLAY LINCOLN BASINANDRANGE MCCARTNEY MAPLE
apple acres AAA 0 2 3 4 6
apple acres AA 1000 900 NA NA 700
pear acres AA 10.0 20 NA 30.0 40
peach acres AAA 500 400 350 300 200
"
my.data2 <- read.table(textConnection(my.data2), header=TRUE)
names(my.data2)[names(my.data2)=="CLAY"] <- "Clay"
names(my.data2)[names(my.data2)=="BASINANDRANGE"] <- "BasinandRange"
names(my.data2)[names(my.data2)=="LINCOLN"] <- "Lincoln"
names(my.data2)[names(my.data2)=="MCCARTNEY"] <- "McCartney"
names(my.data2)[names(my.data2)=="MAPLE"] <- "Maple"
my.data2
Note that I included the names McCartney
and BasinandRange
to make things more realistic and more difficult. However, if I can find a one-liner to deal with 95% of the names and use the above names
statements to deal with complications like McCartney
and BasinandRange
that would be great.
I have searched the internet, including the StackOverflow archives, without finding a solution. Sorry if I overlooked one. Thank you for any help.
Upvotes: 31
Views: 108899
Reputation: 59
This will make every column upper case.
dplyr::rename_with(names,toupper)
Upvotes: 4
Reputation: 1
A recent 'tidy' solution I arrived at which renames all columns by making the first letter uppercase and doing nothing to the rest of the string.
Preliminaries:
library(tibble)
library(dplyr)
library(stringr)
df <- tibble("column 1" = rep(1,10),
"column 2 ABC" = rep("a",10),
"Column 3 Abc" = rep("b", 10))
head(df)
`column 1` `column 2 ABC` `Column 3 Abc`
<dbl> <chr> <chr>
1 1 a b
2 1 a b
3 1 a b
4 1 a b
5 1 a b
6 1 a b
Rename columns using custom function:
df <- df %>%
rename_with(function(x) str_replace(x, "^.", toupper))
head(df)
`Column 1` `Column 2 ABC` `Column 3 Abc`
<dbl> <chr> <chr>
1 1 a b
2 1 a b
3 1 a b
4 1 a b
5 1 a b
6 1 a b
The regex "^." can be understood as (based on https://regexr.com/7rp4k):
Upvotes: 0
Reputation: 846
data.table syntax, I believe would save more time and efficient. its also a one line statement, even shorter.
library(data.table)
setnames(my.data2, tolower(names(my.data2[4:8])))
# landuse units grade clay lincoln basinandrange mccartney maple
#1: apple acres AAA 0 2 3 4 6
#2: apple acres AA 1000 900 NA NA 700
#3: pear acres AA 10 20 NA 30 40
#4: peach acres AAA 500 400 350 300 200
Update 2023
Your question was to capitalise the first letter which we didn't quite address then. Here is an updated one-liner data.table
approach.
setnames(my.data2, 4:8, tools::toTitleCase(names(my.data2)[4:8]))
Upvotes: 11
Reputation: 812
Combining two of the answers here, I've come up with an elegant tidy
way:
This renames all column/variable names by capitalising the first letter of every word.
library(tidyverse)
my.data2 %>%
rename_with(str_to_title)
Upvotes: 20
Reputation: 537
A "tidy" solution:
library(dplyr)
my.data2.mod <- my.data2 %>%
rename_at(c("CLAY", "LINCOLN", "BASINANDRANGE", "MCCARTNEY", "MAPLE"),
.funs = tolower)
names(my.data2.mod)
# [1] "landuse" "units" "grade" "clay"
# [5] "lincoln" "basinandrange" "mccartney" "maple"
Also, to answer the original question and leave some cases capitalized, you can use the snakecase
package:
library(snakecase)
my.data2.mod = my.data2 %>%
rename_at(
c("CLAY", "LINCOLN", "BASINANDRANGE", "MCCARTNEY", "MAPLE"),
.funs = list(
~ to_upper_camel_case(.,
abbreviations = c("McCartney", "BasinandRange")
)
)
)
names(my.data2.mod)
# [1] "landuse" "units" "grade" "Clay"
# [5] "Lincoln" "BasinandRange" "McCartney" "Maple"
Upvotes: 8
Reputation: 3111
This is now a job for janitor::clean_names()
, just choose case
parameter that fits you need.
Upvotes: 27
Reputation: 13103
I used Josh O'Brien's answer, but eventually wrote the code below that creates column names with the first letter
in upper case and the other letters in lower case, with a few exceptions handled as in the original post. Below I used the same data set as in the original post, but read that data into R differently where n.col
determines the number of columns in the data file:
n.col <- as.numeric(length(scan("c:/users/mark w miller/simple R programs/names_with_capital_letters.txt",
what="character", nlines=1)))
my.data2 <- read.table(file = "c:/users/mark w miller/simple R programs/names_with_capital_letters.txt",
na.string=NA, header = T, colClasses = c('character', 'character', 'character',
rep('numeric', (n.col[1] - 3))))
first.letter <- substring(names(my.data2)[-1:-3], 1, 1)
other.letters <- tolower(substring(names(my.data2)[-1:-3], 2))
newnames <- paste(first.letter, other.letters, sep="")
names(my.data2)[-1:-3] <- newnames
names(my.data2)[names(my.data2)=="Basinandrange"] <- "BasinandRange"
names(my.data2)[names(my.data2)=="Mccartney"] <- "McCartney"
my.data2
# landuse units grade Clay Lincoln BasinandRange McCartney Maple
# 1 apple acres AAA 0 2 3 4 6
# 2 apple acres AA 1000 900 NA NA 700
# 3 pear acres AA 10 20 NA 30 40
# 4 peach acres AAA 500 400 350 300 200
Upvotes: 3
Reputation: 162321
Here is a one-liner implementing "the easiest way to match column names among data sets" that I can think of:
## Columns 1:3 left unaltered since they are not place names.
names(my.data2)[-1:-3] <- tolower(names(my.data2)[-1:-3])
## View the results
names(my.data2)
# [1] "landuse" "units" "grade" "clay"
# [5] "lincoln" "basinandrange" "mccartney" "maple"
Upvotes: 44