Reputation: 12177
I have strings like this...
"1. yada yada yada (This is a string; "This is a thing")
2. blah blah blah (This is also a string)"
I want to return...
['this is a string', 'this is also a string']
so it should match everything between the '(' and ';' or between '(' and ')'
this is what I have so far in python matches the sections I want, but I can't figure out how to cut them down to return what I really want inside them...
pattern = re.compile('\([a-zAZ ;"]+\)|\([a-zAZ ]+\)')
re.findall(pattern)
it returns this...
['(This is a string; "This is a thing"), '(This is also a string)']
EDIT ADDED FOR MORE INFO:
I realized there is more parenthesis above the numebred text sections that I want to omit....
"some text and stuff (some more info)
1. yada yada yada (This is a string; "This is a thing")
2. blah blah blah (This is also a string)"
I don't want to match "(some more info)" but I am not sure how to only include the text after the numbers (ex. 1. lskdfjlsdjfds(string I want))
Upvotes: 1
Views: 1352
Reputation: 2274
I would suggest
^[^\(]*\(([^;\)]+)
Splitting it into parts:
# ^ - start of string
# [^\(]* - everything that's not an opening bracket
# \( - opening bracket
# ([^;\)]+) - capture everything that's not semicolon or closing bracket
Unless of course you wish to impose (or drop) some requirements on "blah blah blah" part.
You can drop the first two parts, but then it will match some things it probably shouldn't... or maybe it should. It all depends on what your objectives are.
P. S. Missed that you want to find all instances. So multiline flag needs to be set:
pattern = re.compile(r'^[^\(]*\(([^;\)]+)', re.MULTILINE)
matches = pattern.findall(string_to_search)
It is important to check for beginning of the line, because your input can be:
"""1. yada yada yada (This is a string; "This is a (thing)")
2. blah blah blah (This is also a string)"""
Upvotes: 1
Reputation: 626689
You can use
\(([^);]+)
The regex demo is available here.
Note the capturing group I set with the help of unescaped parentheses: the value captured with this subpattern is returned by the re.findall
method, not the whole match.
It matches
\(
- a literal (
([^);]+)
- matches and captures 1 or more characters other than )
or ;
import re
p = re.compile(r'\(([^);]+)')
test_str = "1. yada yada yada (This is a string; \"This is a thing\")\n2. blah blah blah (This is also a string)"
print(p.findall(test_str)) # => ['This is a string', 'This is also a string']
Upvotes: 2