Jennifer
Jennifer

Reputation: 163

How to replace second or more occurrences of a dot from a column name

Folks, how can I replace second occurrence of a dot from column names?

Sample data:

age.range.abc = sample(c("ar2-15", "ar16-29", "ar30-44"), 200, replace = TRUE)
gender.region.q = sample(c("M", "F"), 200, replace = TRUE)
region_g.a = sample(c("A", "B", "C"), 200, replace = TRUE)
physi = sample(c("Poor", "Average", "Good"), 200, replace = TRUE)
survey = data.frame(age.range.abc, gender.region.q, region_g.a,physi)
head(survey)

I tried this but it removes all dots with underscore. I want to replace only second or more occurrences with underscore.

names(survey) = gsub("\\.", "_", names(survey))
names(survey)
# [1] "age_range_abc"   "gender_region_q" "region_g_a"      "physi" 

Thanks, J

Upvotes: 3

Views: 3617

Answers (3)

GKi
GKi

Reputation: 39657

You can use sub with (\\.[^.]*)\\. where

\\. matches .

[^.] matches everything but not a .

* match it 0 or more times

The brackets ( ) are used to store the match, here in \\1:

sub("(\\.[^.]*)\\.", "\\1_", names(survey))
#[1] "age.range_abc"   "gender.region_q" "region_g.a"      "physi"          

To be more explicit ^([^.]*\\.[^.]*)\\. can be used where the first ^ indicates the start of the string:

sub("^([^.]*\\.[^.]*)\\.", "\\1_", names(survey))
#[1] "age.range_abc"   "gender.region_q" "region_g.a"      "physi"          

Upvotes: 0

G5W
G5W

Reputation: 37641

In the spirit of your original code:

names(survey) = sub("(\\..*?)\\.", "\\1_", names(survey))
names(survey)
[1] "age.range_abc"   "gender.region_q" "region_g.a"      "physi" 

A little extra detail in case it helps.

\\. matches the first .
.*? The . matches any character. .* matches zero or more instances of any character. But the matching is greedy; it would match as much as possible. I want matching that is not greedy (only up until the second .) so I added ? to suppress the greedy match and .*? matches any group of characters up until we hit the next thing in the regex which is ...
another \\. to match the second ..
Because the first part was enclosed in parentheses (\\..*?) it is stored as \1, so the substitution pattern \\1_ restores everything before the second . and the second . is replaced with the _ .

Upvotes: 12

akrun
akrun

Reputation: 887088

One option is strsplit

names(survey) <- sapply(strsplit(names(survey), "[.]"), function(x) 
    if(length(x) >1) paste(x[1], paste(x[-1], collapse="_"), sep=".") else x)
names(survey)
#[1] "age.range_abc"   "gender.region_q" "region_g.a"      "physi"  

Upvotes: 1

Related Questions