Hannie
Hannie

Reputation: 427

R: Gsub replacing pattern with skipping a character in replacement

I want to do a simple replacement using gsub() function in R. See example:

#I want: 
Huiswaard 2 Oost
Huiswaard 1 Zuid
Huiswaard 2 West

#To become:
Huiswaard-2-Oost
Huiswaard-1-Oost
Huiswaard-2-Oost 

By means of the magnificent method of trial & error I tried this:

data <- gsub('Huiswaard\\s.\\s>*', "Huiswaard-.-", df)
data <- gsub('Huiswaard\\s.\\s>*', "Huiswaard-.*-", df)
data <- gsub('Huiswaard\\s.\\s>*', "Huiswaard-(.)-", df)
data <- gsub('Huiswaard\\s.\\s>*', "Huiswaard-\\(\\)-", df)

All not working. I end up with stuff like:

Huiswaard-.-West

Does anyone have an idea of how you can use gsub to skip an character in the replacement argument?

Upvotes: 0

Views: 632

Answers (1)

jasbner
jasbner

Reputation: 2283

In regex you can group with parenthesis and back-reference with \\1

data <- gsub('Huiswaard\\s(\\d)\\s>*', "Huiswaard-\\1-", df)
data
[1] "Huiswaard-2-Oost" "Huiswaard-1-Zuid" "Huiswaard-2-West"

If you want to change the suffix, you could also capture the second word with \\w+ which will capture 1 or more word characters after the space.:

data <- gsub('Huiswaard\\s(\\d)\\s\\w+', "Huiswaard-\\1-Oost", df)
data
[1] "Huiswaard-2-Oost" "Huiswaard-1-Oost" "Huiswaard-2-Oost"

I use this cheat sheet to help me understand regular expressions: https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf

Upvotes: 3

Related Questions