Reputation: 47
Package in use stringr
I am trying to remove all strings before ":" or "|" but my code output is not giving me expected output.
Below is the sample data:
x <- c("Q3: AGE", "Q4: COUNTRY", "Q5: STATE, PROVINCE, COUNTY, ETC",
"Q6 | 100 Grand Bar", "Q6 | Anonymous brown globs that come in black and
orange wrappers\t(a.k.a. Mary Janes)",
"Q6 | Any full-sized candy bar", "Q6 | Black Jacks")
Below is my R code:
x %>%
str_replace_all("(.*: | .*\\|)", "")
Below is my expected result:
x <- c("AGE", "COUNTRY", "STATE, PROVINCE, COUNTY, ETC",
"100 Grand Bar", "Anonymous brown globs that come in black and orange
wrappers\t(a.k.a. Mary Janes)",
"Any full-sized candy bar", "Black Jacks")
Upvotes: 1
Views: 197
Reputation: 19716
Here is another regex:
gsub("^.*?(: |\\ |)", "", x)
or
gsub("^.*?(:|\\|) ", "", x)
or
gsub("^.*?(:|\\|) ?", "", x) #if the vector contains mixed `:text`, `| text` without and with spaces
#output
[1] "AGE"
[2] "COUNTRY"
[3] "STATE, PROVINCE, COUNTY, ETC"
[4] "100 Grand Bar"
[5] "Anonymous brown globs that come in black and \norange wrappers\t(a.k.a. Mary Janes)"
[6] "Any full-sized candy bar"
[7] "Black Jacks"
^.*?
- match the least amount of characters from the start of the string
(: |\\| )
- :
or |
Upvotes: 1
Reputation: 51582
Here is a non-regex approach,
unlist(sapply(strsplit(x, ': | [|] '), function(i) paste(trimws(i[-1]), collapse = ' ')))
#[1] "AGE"
#[2] "COUNTRY"
#[3] "STATE, PROVINCE, COUNTY, ETC"
#[4] "100 Grand Bar"
#[5] "Anonymous brown globs that come in black and \n orange wrappers\t(a.k.a. Mary Janes)"
#[6] "Any full-sized candy bar"
#[7] "Black Jacks"
#or with a slightly different regex than @akrun's solution,
sub('Q[0-9]+: |Q[0-9]+ \\| ', '', x)
Upvotes: 0
Reputation: 887048
We can use sub
to match the zero or more characters that are not a :
or |
([^:|]*
) from the start (^
) of the string followed by a :
or (|
) the |
(escape it as it is a metacharacter meaning OR) followed by zero or more spaces (\\s*
) and replace it with blank (""
)
sub("^[^:|]*(:|\\|)\\s*", "", x)
#[1] "AGE"
#[2] "COUNTRY"
#[3] "STATE, PROVINCE, COUNTY, ETC"
#[4] "100 Grand Bar"
#[5] "Anonymous brown globs that come in black and \norange wrappers\t(a.k.a. Mary Janes)"
#[6] "Any full-sized candy bar"
#[7] "Black Jacks"
Upvotes: 0