James Hallen
James Hallen

Reputation: 5004

Regex in Python for matching contents inside ()

I wanted to match contents inside the parentheses (one with "per contract", but omit unwatned elements like "=" in the 3rd line) like this:

1/100 of a cent ($0.0001) per pound ($6.00 per contract) and 
.001 Index point (10 Cents per contract) and 
$.00025 per pound (=$10 per contract)

I'm using the following regex:

r'.*?\([^$]*([\$|\d][^)]* per contract)\)'

This works well for any expression inside the parentheses which starts of with a $, but for the second line, it omits the 1 from 10 Cents. Not sure what's going on here.

Upvotes: 0

Views: 87

Answers (4)

alexis
alexis

Reputation: 50190

for the second line, it omits the 1 from 10 Cents. Not sure what's going on here.

What's going on is that [^$]* is greedy: It'll happily match digits, and leave just one digit to satisfy the [\$|\d] that follows it. (So, if you wrote (199 cents you'd only get 9). Fix it by writing [^$]*? instead:

r'.*?\([^$]*?([\$|\d][^)]* per contract)\)'

Upvotes: 1

butch
butch

Reputation: 2188

This will match the output you specified in your comments:

re.search('\((([^)]+) per contract)\)', str).group(1)

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

You can use:

r'(?<=\()[^=][^)]*? per contract(?=\))'

Upvotes: 0

Explosion Pills
Explosion Pills

Reputation: 191729

You could probably use a less specific regex

re.findall(r'\(([^)]+) per contract\)', str)

This will match the "$6.00" and the "10 Cents."

Upvotes: 2

Related Questions