Baptiste Arnaud
Baptiste Arnaud

Reputation: 2750

Regex for amounts in euro

I need to find a regex expression that select only the amounts (in euros) so the value needs to be preceded by a or euros and that after the , we have the pennies, there can be spaces or dots as well.

7 967  59 €
- 9847, 48 euros à titre de rappel de salaire sur le bonus de l'année 2012,
 - 1929, 78 euros à titre de rappel de salaire sur le bonus de l'année 2013,
  - 129 689, 78 euros à titre de solde d'indemnité conventionnelle de licenciement,
- 1098 euros au titre du paiement du DIF,
é à 20 892, 05 euros, il ressort des pi
le de 27 084, 26 euros
ée à 26 395, 10 euros, hors bo
 de 129 689, 78 euros,
6.000 € au titre des dommages et intérêts pour licenciement sans cause réelle et sérieuse,
 1.510 € au titre de l'indemnité compensatrice de préavis,
 151 € au titre des congés payés y afférents, 739 € au titre de l'indemnité de licenciement,
 656,19 € au titre de l'indemnité due au titre de la non rémunération de la période de mise à pied conservatoire,
 65,61 € au titre des congés payés afférents,
 2.000 € au titre de  59 € au titre de <span class="highlight_underline">l'indemnité légale de licenciement</span>
2014,7 967, 59 € au titre de <span class="highlight_underline">l'indemnité légale de licenciement</span>
rappel de salaires de janvier 2007 au 7 mars 2007 3.708,34 €
SECTION B N° 419 425 426 427 428 429 430 432 433 434 436 441 442 443 444 446 467 571 572

I came up with this:

(\d.+\d+)(?:\s(?:euros?|€))

But it isn't as accurate as it should.

Can someone help me ??

EDIT:

@Wiktor Stribiżew gave me :

(\d[\d.\s,]*)(?:\s(?:euro|€))

which is close but with this examples:

2014,7 967, 59 €

it takes also the 2014,

and with 49715 11000158926 101,30 €

it takes 49715 11000158926. Numbers are limited to groups of 3.

and with 2007 3.708,34 €

it shouldn't take the 2007 as well

Edit 2:

Thanks for the answer, but it seems not to work in my python script :

import regex
sentences_pd = pd.read_csv('sampled_amounts.csv', names=["text"])
sentences_pd.head()
print([(regex.findall("\b((?:\d+|\d{1,3}(?:[,.\s]\d{3})*)(?:[,.\s]*\d+)?)\s(?:euros?|€)", x)) for x in sentences_pd['text']])

the text column looks like:

enter image description here

It gives me an empty array

[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

Upvotes: 2

Views: 2382

Answers (2)

Frank Mascarell
Frank Mascarell

Reputation: 123

In case it helps, I created this Regex for Spanish prices (€). The conditions are:

1.- The decimal point with 2 exact decimals
2.- Decimals can not be "00"
3.- The point of a thousand is not admitted. In my case the prices do not exceed 999 €
4.- The front and back spaces are not allowed
5.- One ¨0¨ is not admitted in front of a whole number

Regex: ^((0\,(?!00)\d{2})|([1-9]\d*(\,(?!00)\d{2})?)|0)$

Allowed values:

  • 0
  • 1234
  • 0,10
  • 12,34

Values not allowed:

  • 0,00
  • 0,1
  • 1234 (space at the beginning)
  • 1234 (space at the end)
  • 12,00
  • 01,23
  • 12,345
  • 1.234

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626950

You may use

\b((?:\d+|\d{1,3}(?:[,.\s]\d{3})*)(?:[,.\s]*\d+)?)\s(?:euros?|€)

See the regex demo

Details

  • \b - a word boundary
  • ((?:\d+|\d{1,3}(?:[,.\s]\d{3})*)(?:[,.\s]*\d+)?) - Group 1
    • (?: - an alternation group start
      • \d+ - 1+ digits
      • | - or
      • \d{1,3} - 1 to 3 digits
      • (?:[,.\s]\d{3})* - 0+ sequences of
        • [,.\s] - 1 whitespace, , or .
        • \d{3} - 3 digits
    • ) - end of the alternation group
    • (?:[,.\s]*\d+)? - an optional group of
      • [,.\s]* - 0+ whitespaces, , or .
      • \d+ - 1 or more digits
  • \s - a whitespace
  • (?:euros?|€) - either euro, euros or

Upvotes: 3

Related Questions