Joff
Joff

Reputation: 12177

regex to match between certain characters

I have strings like this...

"1. yada yada yada (This is a string; "This is a thing")
 2. blah blah blah (This is also a string)"

I want to return...

['this is a string', 'this is also a string']

so it should match everything between the '(' and ';' or between '(' and ')'

this is what I have so far in python matches the sections I want, but I can't figure out how to cut them down to return what I really want inside them...

pattern = re.compile('\([a-zAZ ;"]+\)|\([a-zAZ ]+\)')
re.findall(pattern)

it returns this...

['(This is a string; "This is a thing"), '(This is also a string)']

EDIT ADDED FOR MORE INFO:

I realized there is more parenthesis above the numebred text sections that I want to omit....

"some text and stuff (some more info)
 1. yada yada yada (This is a string; "This is a thing")
 2. blah blah blah (This is also a string)"

I don't want to match "(some more info)" but I am not sure how to only include the text after the numbers (ex. 1. lskdfjlsdjfds(string I want))

Upvotes: 1

Views: 1352

Answers (2)

Lav
Lav

Reputation: 2274

I would suggest

^[^\(]*\(([^;\)]+)

Splitting it into parts:

# ^         - start of string
# [^\(]*    - everything that's not an opening bracket
# \(        - opening bracket
# ([^;\)]+) - capture everything that's not semicolon or closing bracket

Unless of course you wish to impose (or drop) some requirements on "blah blah blah" part.

You can drop the first two parts, but then it will match some things it probably shouldn't... or maybe it should. It all depends on what your objectives are.

P. S. Missed that you want to find all instances. So multiline flag needs to be set:

pattern = re.compile(r'^[^\(]*\(([^;\)]+)', re.MULTILINE)
matches = pattern.findall(string_to_search)

It is important to check for beginning of the line, because your input can be:

"""1. yada yada yada (This is a string; "This is a (thing)")
2. blah blah blah (This is also a string)"""

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

You can use

\(([^);]+)

The regex demo is available here.

Note the capturing group I set with the help of unescaped parentheses: the value captured with this subpattern is returned by the re.findall method, not the whole match.

It matches

  • \( - a literal (
  • ([^);]+) - matches and captures 1 or more characters other than ) or ;

Python demo:

import re
p = re.compile(r'\(([^);]+)')
test_str = "1. yada yada yada (This is a string; \"This is a thing\")\n2. blah blah blah (This is also a string)"
print(p.findall(test_str)) # => ['This is a string', 'This is also a string']

Upvotes: 2

Related Questions