Sam Gilbert
Sam Gilbert

Reputation: 1702

Issue removing two patterns (string and numbers) using regex in R

I am trying to remove two patterns using str_replace in R.

The patterns that I would like to remove are \\d+_ and Baskets

I first tried:

> library(stringr)

> variables <- c("1_SmallBaskets", "2_Medium", "3_High")

> str_replace(variables, "Baskets|\\d+_", "")

[1] "SmallBaskets" "Medium"       "High"

As far as I can make out, as the pattern \\d+_ comes first this is replaced but then it moves onto the next without replacing the Baskets

I then tried making the expression greedy (example below), but this seems to only be checking for the expression Baskets

> str_replace(variables, "Baskets|\\d+_/g", "")

[1] "1_Small"  "2_Medium" "3_High"

I have tested that the syntax Small|High works, i.e. replaces Small or High, so I don't understand why when trying to replace a digit and a character the same logic doesn't apply

Upvotes: 3

Views: 786

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

With str_replace, you only replace the first occurrence. With str_replace_all, you will replace all occurrences, all matches inside one string. See this code:

> library(stringr)
> variables <- c("1_SmallBaskets", "2_Medium", "3_High")
> str_replace(variables, "Baskets|\\d+_", "")
[1] "SmallBaskets" "Medium"       "High"        
> str_replace_all(variables, "Baskets|\\d+_", "")
[1] "Small"  "Medium" "High"  

Also, you can really just leverage the gsub here:

> gsub("Baskets|\\d+_", "", variables)
[1] "Small"  "Medium" "High"  

Upvotes: 2

Related Questions