Reputation: 293
I'm trying to replace some (but not all) column names in a dataframe with more descriptive labels. I have a vector with longnames and need to match and replace the current relevant column names.
In more detail:
I have a dataframe with columns that are both text and numeric. Eg
df<-data.frame(text1=c("nnnn","uuuu","ooo"),
text2=c("b","t","eee"),
a1=c(1,2,3),
a2=c(45,43,23),
b1=c(43,6,2),
text3=c("gg","ll","jj"))
So it looks like this:
df
text1 text2 a1 a2 b1 text3
1 nnnn b 1 45 43 gg
2 uuuu t 2 43 6 ll
3 ooo eee 3 23 2 jj
I also have a vector of longer labels for some of the column labels:
longnames=c("a1 age","a2 gender","b1 postcode")
Where there is a matching long name, I would like to completely replace the corresponding short names in df. So my desired output is:
text1 text2 a1 age a2 gender b1 postcode text3
1 nnnn b 1 45 43 gg
2 uuuu t 2 43 6 ll
3 ooo eee 3 23 2 jj
All of the short labels that need to be replaced uniquely match the start of the relevant long label. In other words, the short label "a2" needs to be replaced by the long label "a2 gender" and this long label is the only long label that starts with "a2".
Upvotes: 1
Views: 1455
Reputation: 2022
One way to do it by using sapply
. This can be done using for
loop as well with almost exact code. seq.int(colnames(df))
produces a sequence of 1:ncol(df)
. grep
finds the index in longnames
when there is a match of respective column name from df
. Then if
condition checks if the length of index vector is > 0 (which should be if there is a column match). Then it makes the replacement.
## sapply (can be replaced with lapply)
sapply(seq.int(colnames(df)), function(x) {
index <- grep(colnames(df)[x], longnames)
if (length(index) > 0) colnames(df)[x] <<- longnames[index]
})
OR
## for loop (note the difference in <<-)
for (x in seq.int(colnames(df))) {
index <- grep(colnames(df)[x], longnames)
if (length(index) > 0) colnames(df)[x] <- longnames[index]
}
Upvotes: 1
Reputation: 79348
you could make use of adist
which is already vectorized:
a = which(!attr(adist(names(df),longnames,counts = T),'counts')[,,'sub'],T)
names(df)[a[,'row']] = longnames #longnames[a[,'col']]
df
text1 text2 a1 age a2 gender b1 postcode text3
1 nnnn b 1 45 43 gg
2 uuuu t 2 43 6 ll
3 ooo eee 3 23 2 jj
Upvotes: 1
Reputation: 32558
m1 = sapply(names(df), function(snm) sapply(longnames, function(lnm) grepl(snm, lnm)))
df1 = setNames(df, replace(names(df), colSums(m1) == 1, longnames[rowSums(m1) == 1]))
df1
# text1 text2 a1 age a2 gender b1 postcode text3
#1 nnnn b 1 45 43 gg
#2 uuuu t 2 43 6 ll
#3 ooo eee 3 23 2 jj
m1
is a matrix showing the matches between column names of df
and longnames
. colSums(m1) == 1
identifies the column names that have a match. rowSums(m1) == 1
identifies the respective matching longnames
.
OR use partial match
inds = pmatch(colnames(df), longnames)
df1 = setNames(df, replace(longnames[inds], is.na(inds), colnames(df)[is.na(inds)]))
Upvotes: 1
Reputation: 3700
dplyr::rename
can rename a subset of columns in one go but it needs a named vector for the new names.
library("tidyverse")
df <- data.frame(
text1 = c("nnnn", "uuuu", "ooo"),
text2 = c("b", "t", "eee"),
a1 = c(1, 2, 3),
a2 = c(45, 43, 23),
b1 = c(43, 6, 2),
text3 = c("gg", "ll", "jj")
)
longnames <- c("a1 age", "a2 gender", "b1 postcode")
shortnames <- str_extract(longnames, "^(\\w+)")
# named vector specifying how to rename
names(shortnames) <- longnames
shortnames
#> a1 age a2 gender b1 postcode
#> "a1" "a2" "b1"
df %>%
rename(!!shortnames)
#> text1 text2 a1 age a2 gender b1 postcode text3
#> 1 nnnn b 1 45 43 gg
#> 2 uuuu t 2 43 6 ll
#> 3 ooo eee 3 23 2 jj
# In this case `!!shortnames` achieves this:
df %>%
rename("a1 age" = "a1",
"a2 gender" = "a2",
"b1 postcode" = "b1")
#> text1 text2 a1 age a2 gender b1 postcode text3
#> 1 nnnn b 1 45 43 gg
#> 2 uuuu t 2 43 6 ll
#> 3 ooo eee 3 23 2 jj
Created on 2019-03-28 by the reprex package (v0.2.1)
Specifying the new names programatically is useful because we can more easily and cleanly change the column name specifications. But for more readability you can start at first with the explicit specification, it is just more writing.
Upvotes: 2