Reputation: 1115
i am trying to write a regex that produces the content in a string that is NOT in parentheses or brackets. The parentheses is always a year, and the brackets could contain any normal characters, upper and lower case. i was going about it by finding the brackets and parentheses and then [^\regex] to escape it (is this right?)
here's the string:
s = 'Some words (1999) [THINGS]
and the regex:
/[^(\(\d{4}\))|\[.*\]]/
but this includes the characters inside the brackets see (http://rubular.com/r/bbpcnnGgCI)
everything works up until adding the [^\regex]
for example, this works to get (1999):
>> puts s.match(/\(\d{4}\)/)
(1999)
and for whats in brackets:
>> puts s.match(/\[.*\]/)
[THINGS]
but put them together using | for "or":
>> puts s.match(/\(\d{4}\)|\[.*\]/)
(1999)
...it just matches the parentheses and its contents.
what's going on here?
what am i doing wrong here?
Upvotes: 1
Views: 7348
Reputation: 61519
(\(\d{4}\))|\[.*\]
means "four digits surrounded in parentheses, and also captured in a group; or anything between square brackets".
[^...]
does not mean "anything that isn't matched by ...
". []
sets up a character-set, which if it starts with ^
is negated. [^(\(\d{4}\))|\[.*\]]
means "a character that is not an open parenthesis or an open parenthesis or a digit or an open brace or a 4 or a close brace or a close parenthesis or a close parenthesis or a pipe or an open square bracket or a period or a star or a close square bracket".
You want to match "any text that is not in parentheses or brackets". This is not easily expressed as a regex directly. What you really want to do is split the string using "any parenthesized or bracketed item" as a delimiter.
I don't know the ruby syntax, but in Python this looks like:
import re
pattern = re.compile(r"(?:\[[^\]]*\])|(?:\(\d{4}*\))")
pattern.split('Some words (1999) [THINGS]') # ['Some words ', ' ', '']
That gives you the individual pieces, assuming you need them. If you're just going to join them up again, then the "replace the delimiters with empty strings" (i.e. gsub
) approach works just fine.
Upvotes: 3
Reputation: 425033
What about looking at this from the opposite direction: Try replacing the pattern \(\d{4}\)
with blank ""
, then you'll have what you want:
s.gsub("\(\d{4}\)", "")
EDITED: To incorporate syntax correction suggested by @rick (thx @rick!)
Upvotes: 0
Reputation: 22643
Try this /\(.+/
which will match everything from the opening (
onwards. If you strip that out, you're left with 'Some words'
which should be what you need?
Two points
(
appearing earlier in the string.By the way, I find this rather valuable when trying to come up with Regex patterns.
Edit This pattern should only match stuff in brackets even if there is a stray bracket earlier in the string.
string.gsub(/(\(|\[).+(\)|\])/, '')
Upvotes: 5
Reputation: 1701
if you need something that matches multiple sets of brackets in a string mixed with words this will work http://rubular.com/r/rvcO4TyBLq
((\(\d{4}\))|(\[[^\]]+\]))+
Upvotes: 0