Reputation: 365
I'm trying to write a regex pattern in Python that will only match the substrings which have a currency symbol/word attached.
String example:
'50,000 and £200.6m, 50p, 500m euro, 800bn euros, $15bn and $99.99. The year 2006 is 20% larger.'
Expected matches:
I do not want numbers that are not currency related to match, such as plain numbers, percentages or years.
My attempt:
(^(£|\$)?)\d+([.,]\d*)?|(\d+([.,]\d*)?(p|(bn|m)|(\seuro(s)))$)
My matches:
Evidently the regex isn't working at all as I expect. It should either match a substring if it begins with a currency symbol or if it ends with one.
Upvotes: 0
Views: 859
Reputation: 20737
Something like this would do it:
(?:£|\$)(?:\d*\.)?\d+(?:m|bn)?|(?:\d*\.)?\d+(?:m|bn)? ?(?:p|euros?)
Just keep adding the currencies you wish to catch to the respective (?:£|\$)
or (?:p|euros?)
sections. Ditto for adding items to (?:m|bn)
https://regex101.com/r/3K9jmR/1
Upvotes: 2
Reputation: 163362
You might use a pattern to match either the euro or dollar sign [£$]
, the value optionally followed by p
m
or bn
.
Or match the value followed by p
m
or bn
and euro with optional s
(?:[£$]\d+(?:\.\d+)?(?:[pm]|bn)?|\d+(?:\.\d+)?(?:[pm]|bn)(?: euros?)?)
Explanation
(?:
Non capture group
[£$]
A character class matching either £
or $
\d+(?:\.\d+)?
Match 1+ digits with an optional decimal part(?:[pm]|bn)?
Optionally match either p
or m
or bn
|
Or\d+(?:\.\d+)?
Match 1+ digits with optional decimal part(?:[pm]|bn)
Match either p
or m
or bn
(?: euros?)?
Optionally match a space and euro
with optional s
)
Close groupUpvotes: 4