Reputation: 6649
I want to update one column of a dataframe, referencing it using its original name, is this possible? For example say I had the table 'data'
a b c
1 2 2
3 2 3
4 1 2
and I wanted to update the name of column b to 'd'. I know I could use
colnames(data)[2] <- 'd'
but can I make the change by specifically referencing b, i.e. something like
colnames(data)['b'] <- 'd'
so that if the column ordering of the dataframe changes the correct column name will still be updated.
Thanks in advance
Upvotes: 18
Views: 49810
Reputation: 23034
As of October 2014 this can now be done easily in the dplyr package:
rename(data, d = b)
Upvotes: 16
Reputation: 59612
There is a function setnames
built into package data.table
for exactly that.
setnames(DT, "b", "d")
It changes the names by reference with no copy at all. Any other method using names(data)<-
or names(data)[i]<-
or similar will copy the entire object, usually several times. Even though all you're doing is changing a column name.
DT
must be type data.table
for setnames
to work, though. So you'd need to switch to data.table
or convert using as.data.table
, to use it.
Here is the extract from ?setnames
. The intention is that you run example(setnames)
at the prompt and then the comments relate to the copies you see being reported by tracemem
.
DF = data.frame(a=1:2,b=3:4) # base data.frame to demo copies
tracemem(DF)
colnames(DF)[1] <- "A" # 4 copies of entire object
names(DF)[1] <- "A" # 3 copies of entire object
names(DF) <- c("A", "b") # 2 copies of entire object
`names<-`(DF,c("A","b")) # 1 copy of entire object
x=`names<-`(DF,c("A","b")) # still 1 copy (so not print method)
# What if DF is large, say 10GB in RAM. Copy 10GB just to change a column name?
DT = data.table(a=1:2,b=3:4,c=5:6)
tracemem(DT)
setnames(DT,"b","B") # by name; no match() needed. No copy.
setnames(DT,3,"C") # by position. No copy.
setnames(DT,2:3,c("D","E")) # multiple. No copy.
setnames(DT,c("a","E"),c("A","F")) # multiple by name. No copy.
setnames(DT,c("X","Y","Z")) # replace all. No copy.
Upvotes: 30
Reputation: 36120
I disagree with @Chase - the grepl
solution ain't the luckiest one. I'd say: go with simple ==
. Here's why:
d <- data.frame(matrix(rnorm(100), 10))
colnames(d) <- replicate(10, paste(sample(letters[1:5], size = 5, replace=TRUE, prob=c(.1, .6, .1, .1, .1)), collapse = ""))
Now try doing grepl("b", colnames(d))
. Either pass fixed = TRUE
, or even better do simple colnames(d) == "b"
like @joran suggested. Regex matching will always be slower than ==
, so for simple tasks like this you may want to use simple ==
.
Upvotes: 1
Reputation: 69231
This seems like a hack, but the first thing that came to mind was to use grepl()
with a sufficiently detailed enough search string to only get the column you want. I'm sure there are better options:
dat <- data.frame(a = 1:3, b = 1:3, c = 1:3)
colnames(dat)[grepl("b", colnames(dat))] <- "foo"
dat
#------
a foo c
1 1 1 1
2 2 2 2
3 3 3 3
As Joran points out below, I overcomplicated things...no need for a regex at all. This saves a few characters on the typing too.
colnames(dat)[colnames(dat) == "foo"] <- "bar"
#------
a bar c
1 1 1 1
2 2 2 2
3 3 3 3
Upvotes: 13
Reputation: 110034
Yes but it's more difficult (as far as I know) than numeric indexing. I'm going to provide a dirty function that will do this and if you want to see how to do it just tear the function apart line by line:
rename <- function(df, column, new){
x <- names(df) #Did this to avoid typing twice
if (is.numeric(column)) column <- x[column] #Take numeric input by indexing
names(df)[x %in% column] <- new #What you're interested in
return(df)
}
#try it out
rename(mtcars, 'mpg', 'NEW')
rename(mtcars, 1, 'NEW')
Upvotes: 4