Reputation: 71
For a string that has free text:
"The shares of the stock at the XKI Market fell by €89.99 today, which saw a drop of a 9€5
from last monday. If they do not level up again to 100€ by the end of this week there might
be serious consequences to the company"
How to find a regex pattern that will extract currency related numbers?
In this case: 89.99, 95, and 100?
So far, I've tried these patterns:
[0-9]*[€.]([0-9]*)
\[0-9]{1,3}(?:\.\[0-9]{3})*,\[0-9]\[0-9]
[0-9]+\€\.[0-9]+
But these don't seem to be producing exactly what is needed
Upvotes: 2
Views: 88
Reputation: 163447
One option is to match all 3 variations and afterwards remove the euro sign from the match.
(?:\d+€\d*|€\d+(?:\.\d+)?)
Explanation
(?:
Non capture group
\d+€\d*
Match 1+ digit and € followed by optional digits|
Or€\d+(?:\.\d+)?
Match € followed by digits and an optional decimal part)
Close non capture groupFor example
import re
regex = r"(?:\d+€\d*|€\d+(?:\.\d+)?)"
test_str = ("\"The shares of the stock at the XKI Market fell by €89.99 today, which saw a drop of a 9€5 \n"
"from last monday. If they do not level up again to 100€ by the end of this week there might \n"
"be serious consequences to the company\"")
print([x.replace("€", "") for x in re.findall(regex, test_str)])
Output
['89.99', '95', '100']
A bit more precise pattern for the number with optional comma followed by 3 digits and 2 digit decimal part could be:
(?:\d+€\d*|€\d{1,3}(?:,\d{3})*\.\d{2})
Upvotes: 1
Reputation: 36590
This need further testing but I would simply grab everything around €
which is not whitespace, that is:
import re
text = """The shares of the stock at the XKI Market fell by €89.99 today, which saw a drop of a 9€5
from last monday. If they do not level up again to 100€ by the end of this week there might
be serious consequences to the company"""
values = re.findall(r"\S*€\S*", text)
print(values)
Output:
['€89.99', '9€5', '100€']
Upvotes: 0