Reputation: 5004
I wanted to match contents inside the parentheses (one with "per contract", but omit unwatned elements like "=" in the 3rd line) like this:
1/100 of a cent ($0.0001) per pound ($6.00 per contract) and
.001 Index point (10 Cents per contract) and
$.00025 per pound (=$10 per contract)
I'm using the following regex:
r'.*?\([^$]*([\$|\d][^)]* per contract)\)'
This works well for any expression inside the parentheses which starts of with a $
, but for the second line, it omits the 1
from 10 Cents
. Not sure what's going on here.
Upvotes: 0
Views: 87
Reputation: 50190
for the second line, it omits the 1 from 10 Cents. Not sure what's going on here.
What's going on is that [^$]*
is greedy: It'll happily match digits, and leave just one digit to satisfy the [\$|\d]
that follows it. (So, if you wrote (199 cents
you'd only get 9
). Fix it by writing [^$]*?
instead:
r'.*?\([^$]*?([\$|\d][^)]* per contract)\)'
Upvotes: 1
Reputation: 2188
This will match the output you specified in your comments:
re.search('\((([^)]+) per contract)\)', str).group(1)
Upvotes: 0
Reputation: 89547
You can use:
r'(?<=\()[^=][^)]*? per contract(?=\))'
Upvotes: 0
Reputation: 191729
You could probably use a less specific regex
re.findall(r'\(([^)]+) per contract\)', str)
This will match the "$6.00" and the "10 Cents."
Upvotes: 2