Chemjong
Chemjong

Reputation: 47

Regex operator to remove multiple strings

Package in use stringr

I am trying to remove all strings before ":" or "|" but my code output is not giving me expected output.

Below is the sample data:

x <- c("Q3: AGE", "Q4: COUNTRY", "Q5: STATE, PROVINCE, COUNTY, ETC", 
"Q6 | 100 Grand Bar", "Q6 | Anonymous brown globs that come in black and 
orange wrappers\t(a.k.a. Mary Janes)", 
"Q6 | Any full-sized candy bar", "Q6 | Black Jacks")

Below is my R code:

x %>% 
str_replace_all("(.*: | .*\\|)", "")

Below is my expected result:

x <- c("AGE", "COUNTRY", "STATE, PROVINCE, COUNTY, ETC", 
"100 Grand Bar", "Anonymous brown globs that come in black and orange 
wrappers\t(a.k.a. Mary Janes)", 
"Any full-sized candy bar", "Black Jacks")

Upvotes: 1

Views: 197

Answers (3)

missuse
missuse

Reputation: 19716

Here is another regex:

gsub("^.*?(: |\\ |)", "", x) 

or

gsub("^.*?(:|\\|) ", "", x)

or

gsub("^.*?(:|\\|) ?", "", x) #if the vector contains mixed `:text`, `| text` without and with spaces
#output
[1] "AGE"                                                                                        
[2] "COUNTRY"                                                                                    
[3] "STATE, PROVINCE, COUNTY, ETC"                                                               
[4] "100 Grand Bar"                                                                              
[5] "Anonymous brown globs that come in black and \norange wrappers\t(a.k.a. Mary Janes)"
[6] "Any full-sized candy bar"                                                                   
[7] "Black Jacks"  

^.*? - match the least amount of characters from the start of the string
(: |\\| ) - : or |

Upvotes: 1

Sotos
Sotos

Reputation: 51582

Here is a non-regex approach,

unlist(sapply(strsplit(x, ': | [|] '), function(i) paste(trimws(i[-1]), collapse = ' ')))

#[1] "AGE"                                                                                      
#[2] "COUNTRY"                                                                                  
#[3] "STATE, PROVINCE, COUNTY, ETC"                                                             
#[4] "100 Grand Bar"                                                                            
#[5] "Anonymous brown globs that come in black and \n       orange wrappers\t(a.k.a. Mary Janes)"
#[6] "Any full-sized candy bar"                                                                 
#[7] "Black Jacks"

#or with a slightly different regex than @akrun's solution,

sub('Q[0-9]+: |Q[0-9]+ \\| ', '', x)

Upvotes: 0

akrun
akrun

Reputation: 887048

We can use sub to match the zero or more characters that are not a : or | ([^:|]*) from the start (^) of the string followed by a : or (|) the | (escape it as it is a metacharacter meaning OR) followed by zero or more spaces (\\s*) and replace it with blank ("")

sub("^[^:|]*(:|\\|)\\s*", "", x)
#[1] "AGE"                                                                               
#[2] "COUNTRY"                                                                           
#[3] "STATE, PROVINCE, COUNTY, ETC"                                                      
#[4] "100 Grand Bar"                                                                     
#[5] "Anonymous brown globs that come in black and \norange wrappers\t(a.k.a. Mary Janes)"
#[6] "Any full-sized candy bar"                                                          
#[7] "Black Jacks"           

Upvotes: 0

Related Questions