Reputation: 163
Folks, how can I replace second occurrence of a dot from column names?
Sample data:
age.range.abc = sample(c("ar2-15", "ar16-29", "ar30-44"), 200, replace = TRUE)
gender.region.q = sample(c("M", "F"), 200, replace = TRUE)
region_g.a = sample(c("A", "B", "C"), 200, replace = TRUE)
physi = sample(c("Poor", "Average", "Good"), 200, replace = TRUE)
survey = data.frame(age.range.abc, gender.region.q, region_g.a,physi)
head(survey)
I tried this but it removes all dots with underscore. I want to replace only second or more occurrences with underscore.
names(survey) = gsub("\\.", "_", names(survey))
names(survey)
# [1] "age_range_abc" "gender_region_q" "region_g_a" "physi"
Thanks, J
Upvotes: 3
Views: 3617
Reputation: 39657
You can use sub
with (\\.[^.]*)\\.
where
\\.
matches .
[^.]
matches everything but not a .
*
match it 0 or more times
The brackets (
)
are used to store the match, here in \\1
:
sub("(\\.[^.]*)\\.", "\\1_", names(survey))
#[1] "age.range_abc" "gender.region_q" "region_g.a" "physi"
To be more explicit ^([^.]*\\.[^.]*)\\.
can be used where the first ^
indicates the start of the string:
sub("^([^.]*\\.[^.]*)\\.", "\\1_", names(survey))
#[1] "age.range_abc" "gender.region_q" "region_g.a" "physi"
Upvotes: 0
Reputation: 37641
In the spirit of your original code:
names(survey) = sub("(\\..*?)\\.", "\\1_", names(survey))
names(survey)
[1] "age.range_abc" "gender.region_q" "region_g.a" "physi"
A little extra detail in case it helps.
\\.
matches the first .
.*?
The .
matches any character. .*
matches zero or more instances of any character. But the matching is greedy; it would match as much as possible. I want matching that is not greedy (only up until the second .
) so I added ? to suppress the greedy match and .*?
matches any group of characters up until we hit the next thing in the regex which is ...
another \\.
to match the second .
.
Because the first part was enclosed in parentheses (\\..*?)
it is stored as \1, so the substitution pattern \\1_
restores everything before the second .
and the second .
is replaced with the _
.
Upvotes: 12
Reputation: 887088
One option is strsplit
names(survey) <- sapply(strsplit(names(survey), "[.]"), function(x)
if(length(x) >1) paste(x[1], paste(x[-1], collapse="_"), sep=".") else x)
names(survey)
#[1] "age.range_abc" "gender.region_q" "region_g.a" "physi"
Upvotes: 1