Reputation: 743
I'd like to insert an underscore after the first three characters of all variable names in a data frame. Any help would be much appreciated.
Current data frame:
df1 <- data.frame("genCrc_b1"=c(1,1,1),"genprd"=c(1,1,1) ,"genopr_b1_b2"=c(1,1,1))
Desired data frame:
df2 <- data.frame("gen_Crc_b1"=c(1,1,1),"gen_prd"=c(1,1,1) ,"gen_opr_b1_b2"=c(1,1,1))
My attempts:
gsub('^(.{3})(.*)$', "_", names(df1))
gsub('^(.{3})(.*)$', '\\_\\2', names(df1))
Upvotes: 1
Views: 2454
Reputation: 38510
You can also use regmatches<-
to replace the sub-expressions.
regmatches(names(df1), regexpr("gen", names(df1), fixed=TRUE)) <- "gen_"
Now, check that the values have been properly changed.
names(df1)
[1] "gen_Crc_b1" "gen_prd" "gen_opr_b1_b2"
Here, regexpr
finds the first position in each element of the character vector that matches the subexpression, "gen". These positions are fed to regmatches
and the substitution is performed.
Upvotes: 2
Reputation: 39154
If your variable names all begin with gen
, we can also do the following.
colnames(df1) <- gsub("gen", "gen_", colnames(df1), fixed = TRUE)
Upvotes: 2
Reputation: 887531
We can use sub
to capture the first 3 characters as a group ((.{3})
) and in the replacement specify the backreference of the group (\\1
) followed by underscore
names(df1) <- sub("^(.{3})", "\\1_", names(df1))
names(df1)
#[1] "gen_Crc_b1" "gen_prd" "gen_opr_b1_b2"
In the OP's post, especially the last one, there were two capture groups, but only one was specified
gsub('^(.{3})(.*)$', '\\1_\\2', names(df1))
BTW, gsub
is not needed as we are replacing only at a single instance instead of multiple times.
In the first case, none of backreference for the captured groups were used in the replacement
Upvotes: 5