gammauniversity
gammauniversity

Reputation: 71

How to find all currency related digits REGEX?

For a string that has free text:

"The shares of the stock at the XKI Market fell by €89.99 today, which saw a drop of a 9€5 
from last monday. If they do not level up again to 100€ by the end of this week there might 
be serious consequences to the company"

How to find a regex pattern that will extract currency related numbers?

In this case: 89.99, 95, and 100?

So far, I've tried these patterns:

[0-9]*[€.]([0-9]*)
\[0-9]{1,3}(?:\.\[0-9]{3})*,\[0-9]\[0-9]
[0-9]+\€\.[0-9]+

But these don't seem to be producing exactly what is needed

Upvotes: 2

Views: 88

Answers (3)

The fourth bird
The fourth bird

Reputation: 163447

One option is to match all 3 variations and afterwards remove the euro sign from the match.

(?:\d+€\d*|€\d+(?:\.\d+)?)

Explanation

  • (?: Non capture group
    • \d+€\d* Match 1+ digit and € followed by optional digits
    • | Or
    • €\d+(?:\.\d+)? Match € followed by digits and an optional decimal part
  • ) Close non capture group

Regex demo

For example

import re

regex = r"(?:\d+€\d*|€\d+(?:\.\d+)?)"

test_str = ("\"The shares of the stock at the XKI Market fell by €89.99 today, which saw a drop of a 9€5 \n"
            "from last monday. If they do not level up again to 100€ by the end of this week there might \n"
            "be serious consequences to the company\"")

print([x.replace("€", "") for x in re.findall(regex, test_str)])

Output

['89.99', '95', '100']

A bit more precise pattern for the number with optional comma followed by 3 digits and 2 digit decimal part could be:

(?:\d+€\d*|€\d{1,3}(?:,\d{3})*\.\d{2})

Regex demo

Upvotes: 1

Egor Dementyev
Egor Dementyev

Reputation: 68

Simpler solution would be [.\d]*€[.\d]*.

Upvotes: 1

Daweo
Daweo

Reputation: 36590

This need further testing but I would simply grab everything around which is not whitespace, that is:

import re
text = """The shares of the stock at the XKI Market fell by €89.99 today, which saw a drop of a 9€5 
from last monday. If they do not level up again to 100€ by the end of this week there might 
be serious consequences to the company"""
values = re.findall(r"\S*€\S*", text)
print(values)

Output:

['€89.99', '9€5', '100€']

Upvotes: 0

Related Questions