Reputation: 2750
I need to find a regex expression that select only the amounts (in euros) so the value needs to be preceded by a €
or euros
and that after the ,
we have the pennies, there can be spaces or dots as well.
7 967 59 €
- 9847, 48 euros à titre de rappel de salaire sur le bonus de l'année 2012,
- 1929, 78 euros à titre de rappel de salaire sur le bonus de l'année 2013,
- 129 689, 78 euros à titre de solde d'indemnité conventionnelle de licenciement,
- 1098 euros au titre du paiement du DIF,
é à 20 892, 05 euros, il ressort des pi
le de 27 084, 26 euros
ée à 26 395, 10 euros, hors bo
de 129 689, 78 euros,
6.000 € au titre des dommages et intérêts pour licenciement sans cause réelle et sérieuse,
1.510 € au titre de l'indemnité compensatrice de préavis,
151 € au titre des congés payés y afférents, 739 € au titre de l'indemnité de licenciement,
656,19 € au titre de l'indemnité due au titre de la non rémunération de la période de mise à pied conservatoire,
65,61 € au titre des congés payés afférents,
2.000 € au titre de 59 € au titre de <span class="highlight_underline">l'indemnité légale de licenciement</span>
2014,7 967, 59 € au titre de <span class="highlight_underline">l'indemnité légale de licenciement</span>
rappel de salaires de janvier 2007 au 7 mars 2007 3.708,34 €
SECTION B N° 419 425 426 427 428 429 430 432 433 434 436 441 442 443 444 446 467 571 572
I came up with this:
(\d.+\d+)(?:\s(?:euros?|€))
But it isn't as accurate as it should.
Can someone help me ??
EDIT:
@Wiktor Stribiżew gave me :
(\d[\d.\s,]*)(?:\s(?:euro|€))
which is close but with this examples:
2014,7 967, 59 €
it takes also the 2014,
and with 49715 11000158926 101,30 €
it takes 49715 11000158926
. Numbers are limited to groups of 3.
and with 2007 3.708,34 €
it shouldn't take the 2007
as well
Edit 2:
Thanks for the answer, but it seems not to work in my python script :
import regex
sentences_pd = pd.read_csv('sampled_amounts.csv', names=["text"])
sentences_pd.head()
print([(regex.findall("\b((?:\d+|\d{1,3}(?:[,.\s]\d{3})*)(?:[,.\s]*\d+)?)\s(?:euros?|€)", x)) for x in sentences_pd['text']])
the text column looks like:
It gives me an empty array
[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
Upvotes: 2
Views: 2382
Reputation: 123
In case it helps, I created this Regex for Spanish prices (€). The conditions are:
1.- The decimal point with 2 exact decimals
2.- Decimals can not be "00"
3.- The point of a thousand is not admitted. In my case the prices do not exceed 999 €
4.- The front and back spaces are not allowed
5.- One ¨0¨ is not admitted in front of a whole number
Regex: ^((0\,(?!00)\d{2})|([1-9]\d*(\,(?!00)\d{2})?)|0)$
Allowed values:
Values not allowed:
Upvotes: 0
Reputation: 626950
You may use
\b((?:\d+|\d{1,3}(?:[,.\s]\d{3})*)(?:[,.\s]*\d+)?)\s(?:euros?|€)
See the regex demo
Details
\b
- a word boundary((?:\d+|\d{1,3}(?:[,.\s]\d{3})*)(?:[,.\s]*\d+)?)
- Group 1
(?:
- an alternation group start
\d+
- 1+ digits|
- or \d{1,3}
- 1 to 3 digits(?:[,.\s]\d{3})*
- 0+ sequences of
[,.\s]
- 1 whitespace, ,
or .
\d{3}
- 3 digits)
- end of the alternation group(?:[,.\s]*\d+)?
- an optional group of
[,.\s]*
- 0+ whitespaces, ,
or .
\d+
- 1 or more digits\s
- a whitespace(?:euros?|€)
- either euro
, euros
or €
Upvotes: 3