Reputation: 761
I am trying to exclude "|" from a list of youtube tags.
So far I am able to regex select all the tags that is cute for example in the below string
cute|"cute nail art"|"cute"|"cute"|"fcute"
I am able to highlight "cute" and cute| exactly. The problem is the "|". How do I get rid of it?
My regex query is this ("\bcute\b")|(\bcute\b[^\s])
.
My expected outcome is to highlight the cute and "cute".
Any tips would be appreciated and thank you for reading.
Upvotes: 0
Views: 48
Reputation: 72226
Assuming the input is a string of tags joined by |
and some tags are enclosed in quotes, and you want to identify and mark somehow a certain tag, both as is and quoted, the regular expression you need could look like this:
(?<=\||^)(cute|"cute")(?=\||$)
Check it in action here: https://regex101.com/r/acjM8R/3
(?<= # start a positive lookbehind assertion
^ # match the beginning of the string
| # OR
\| # match the character '|' literally (it has a special meaning when not escaped)
) # end of the lookbehind assertion
( # start a capturing group; it is also used to group the alternatives
cute # match the word 'cute' (the tag) as is
| # OR
"cute" # match the word "cute" (the tag) when it is quoted
) # end of the group
(?= # start a positive lookahead assertion
\| # match the character '|' literally (it has a special meaning when not escaped)
| # OR
$ # match the end of the string
) # end of the lookahead assertion
The fragment ^|\|
matches either the start of the string (^
) or the character |
(the separator). Similar, the fragment \||$
matches either a |
(the separator) or the end of the string.
A positive assertion is a test on the characters preceding ((?<= ... )
) or following ((?= ... )
) the current matching point that does not actually consume any characters.
All in all, the regex above matches either cute
or "cute"
but only when it is surrounded by the delimiter |
or by the string boundaries.
A different way to write (cute|"cute")
is (("?)cute\2)
.
The fragment ("?)
captures an optional (?
) quote ("
). It is followed by the actual tag. The fragment \2
means "the same as the second capture group" which, in this case, is the text matched by ("?)
.
This means that if ("?)
matches something (a quote), \2
also must match a quote. If ("?)
matches an empty string (there is no quote between |
and cute
), \2
also matches an empty string.
See it working here: https://regex101.com/r/acjM8R/4/
Upvotes: 1
Reputation: 3466
I assume what you are tying to do is use a literal |
. Therefore you need to escape it like \|
.
Upvotes: 0