replacing the nth character in a string only if it is a particular character in R

Question

I am importing a series of surveys as .csv files and combining into one data set. The problem is for one of the seven files some of the variables are importing slightly differently. The data set is huge and I would like to find a way to write a function to run over dataset that is giving me trouble.

In some of the variables there is an underscore when there should be a dot. Not all variables are of the same format but the ones that are incorrect are, in that the underscore is always the 6th element of the column name.

I want R to look for the 6th element and if it is an underscore replace it with a dot. here is a made up example below.

col_names <- c("s1.help_needed",
               "s1.Q2_im_stuck",
               "s1.Q2.im_stuck",
               "s1.Q3.regex",
               "s1.Q3_regex",
               "s2.Q1.is_confusing",
               "s2.Q2.answer_please",
               "s2.Q2_answer_please",
               "s2.someone_knows_the answer",
               "s3.appreciate_the_help")

I assume there is a Regex answer to this but i am struggling to find one. perhaps there is also a tidyr answer?

Tim Biegeleisen · Accepted Answer

As @thelatemail pointed out, none of your data actually has underscores in the fifth position, but some have it in the sixth position (where others have dot). A base R approach would be to use gsub():

result <- gsub("^(.{5})_", "\1.", col_names)

> result
 [1] "s1.help_needed"              "s1.Q2.im_stuck"             
 [3] "s1.Q2.im_stuck"              "s1.Q3.regex"                
 [5] "s1.Q3.regex"                 "s2.Q1.is_confusing"         
 [7] "s2.Q2.answer_please"         "s2.Q2.answer_please"        
 [9] "s2.someone_knows_the answer" "s3.appreciate_the_help"

Here is an explanation of the regex:

^         from the start of the string
(.{5})    match AND capture any five characters
_         followed by an underscore

The quantity in parentheses is called a capture group and can be used in the replacement via \1. So the regex is saying replace the first six characters with the five characters we captured but use a dot as the sixth character.

replacing the nth character in a string only if it is a particular character in R

Answers (2)

Related Questions