LLL
LLL

Reputation: 743

inserting an underscore into a particular part of variable names R

I'd like to insert an underscore after the first three characters of all variable names in a data frame. Any help would be much appreciated.

Current data frame:

df1 <- data.frame("genCrc_b1"=c(1,1,1),"genprd"=c(1,1,1) ,"genopr_b1_b2"=c(1,1,1))

Desired data frame:

df2 <- data.frame("gen_Crc_b1"=c(1,1,1),"gen_prd"=c(1,1,1) ,"gen_opr_b1_b2"=c(1,1,1))

My attempts:

gsub('^(.{3})(.*)$', "_", names(df1))
gsub('^(.{3})(.*)$', '\\_\\2', names(df1))

Upvotes: 1

Views: 2454

Answers (3)

lmo
lmo

Reputation: 38510

You can also use regmatches<- to replace the sub-expressions.

regmatches(names(df1), regexpr("gen", names(df1), fixed=TRUE)) <- "gen_"

Now, check that the values have been properly changed.

names(df1)
[1] "gen_Crc_b1"    "gen_prd"       "gen_opr_b1_b2"

Here, regexpr finds the first position in each element of the character vector that matches the subexpression, "gen". These positions are fed to regmatches and the substitution is performed.

Upvotes: 2

www
www

Reputation: 39154

If your variable names all begin with gen, we can also do the following.

colnames(df1) <- gsub("gen", "gen_", colnames(df1), fixed = TRUE)

Upvotes: 2

akrun
akrun

Reputation: 887531

We can use sub to capture the first 3 characters as a group ((.{3})) and in the replacement specify the backreference of the group (\\1) followed by underscore

names(df1) <- sub("^(.{3})", "\\1_", names(df1))
names(df1)
#[1] "gen_Crc_b1"    "gen_prd"       "gen_opr_b1_b2"

In the OP's post, especially the last one, there were two capture groups, but only one was specified

gsub('^(.{3})(.*)$', '\\1_\\2', names(df1))

BTW, gsub is not needed as we are replacing only at a single instance instead of multiple times.

In the first case, none of backreference for the captured groups were used in the replacement

Upvotes: 5

Related Questions