Reputation: 25
With my programm I want to be able to make a picture of any receipt and filter certain informations, one beeing the price.
My Input is the following:
----------
BT em <br/>
SCHWEINFURT _OSKAR-VON-MILLER-STR.6 <br/>
RADIESCHEN **0,59** <br/>
KAESEAUFSCH. **1.39** <br/>
BAUCHSPECK **1,19** <br/>
BAUCHSPECK **1,19** <br/>
DORNFELDER **0,99**<br/>
CLEMENTINEN **2,49**<br/>
L&M BLUE **3,50**<br/>
L&M BLUE **3,50**<br/>
SUMME EUR **14,84** *<br/>
BAR **50,00**<br/>
RUCKGELD EUR **35,16**<br/>
“ENTHALTENE MEHRWERTSTEUER A<br/>
MWST NETTO<br/>
**7,00** % **0,45** **6,40**<br/>
**19,00** % **1,28** **6,71**<br/>
SUMME MWST **1,73** **13,11**<br/>
EDEKA HANDELSGFSELLSCHAFT<br/>
NORDBAYERN-SACHSEN-THURINGEN MBH<br/>
STEUERNUMMER: 257/115/30471<br/>
QUITTUNG<br/>
NUTZEN SIE DIF EDECARD<br/>
PUNKTE_SAMMELN+PRAMIEN ERWERBEN<br/>
THR EINKAUF WARE UNS<br/>
1 BONUSPUNKTE WERT GEWESEN !<br/>
08.12.07 16:27 37589 48 4 8500<br/>
FS BEDIENTE STE: H. SEUFERT :<br/>
VIELEN DANK FÜR IHREN EINKAUF!<br/>
AUF WIEDERSEHEN IM E-CENTER<br/>
UNSERE ÖFFNUNGSZEITEN FÜR SIE:<br/>
MONTAG-SAMSTAG: 0800-20 . 00UER<br/>
The informations I want to obtain are bold.
First I tried the following RegExp:
/(([\d]{1,2})(\,|\.)[\d]{2})/g
I choose this one, because
I am looking for more than one match, therefore /
(...) /g
[\d]{1,2}
(\,|\.)
[\d]{2}
As you can see part of the date is a match, which I don't want. Right now I don't mind the part after MWST Netto to match.
My idea was to look for the dot. So I tried adding [^.] before and after my RegExp
As you can see my problem is still there. I don't understand why 6,40 and 6,71 is not a match anymore as there is no dot before or after.
Does anyone got an idea what to try next? I was thinking about and AND-Statement, so I would use my first RegExp and then exclude anything that looks like a date. But I'm not sure how to that.
I would really appreciate any tipps or ideas you have. If there is anything unclear or you need more information, please do not hesitate to ask.
Upvotes: 1
Views: 244
Reputation: 163362
One way could be to use an alternation to match the format that you don't want and then capture in a group what you do want:
\d+\.\d+\.\d+|(\d{1,2}[.,]\d{1,2})
Explanation
\d+\.\d+\.\d+
Match pattern that you don't want to capture (or for example \d{2}\.\d{2}\.\d{2}
if you want to be more specific)|
Or(\d{1,2}[.,]\d{2})
Capture in a group 1 or 2 digits, a comma or dot and then 2 digitsconst regex = /\d+\.\d+\.\d+|(\d{1,2}[.,]\d{2})/g;
const str = `BT em
SCHWEINFURT _OSKAR-VON-MILLER-STR.6
RADIESCHEN 0,59
KAESEAUFSCH. 1.39
BAUCHSPECK 1,19
BAUCHSPECK 1,19
DORNFELDER 0,99
CLEMENTINEN 2,49
L&M BLUE 3,50
L&M BLUE 3,50
SUMME EUR 14,84 *
BAR 50,00
RUCKGELD EUR 35,16
“ENTHALTENE MEHRWERTSTEUER A
MWST NETTO
7,00 % 0,45 6,40
19,00 % 1,28 6,71
SUMME MWST 1,73 13,11
EDEKA HANDELSGFSELLSCHAFT
NORDBAYERN-SACHSEN-THURINGEN MBH
STEUERNUMMER: 257/115/30471
QUITTUNG
NUTZEN SIE DIF EDECARD
PUNKTE_SAMMELN+PRAMIEN ERWERBEN
THR EINKAUF WARE UNS
1 BONUSPUNKTE WERT GEWESEN !
08.12.07 16:27 37589 48 4 8500
FS BEDIENTE STE: H. SEUFERT :
VIELEN DANK FÜR IHREN EINKAUF!
AUF WIEDERSEHEN IM E-CENTER
UNSERE ÖFFNUNGSZEITEN FÜR SIE:
MONTAG-SAMSTAG: 0800-20 . 00UER`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
if (m[1]) {
console.log(m[1]);
}
}
Upvotes: 1
Reputation: 626893
You may use
/(?:^|[^.\d])(\d{1,2}[,.]\d{2})(?![.\d])/g
and grab the contents of Group 1. See the regex demo.
Details
(?:^|[^.\d])
- start of string or any char other than .
and digit(\d{1,2}[,.]\d{2})
- Group 1: 1 or 2 digits, .
or ,
, two digits(?![.\d])
- no .
or digit immediately to the right is allowed.var text = "BT em \r\nSCHWEINFURT _OSKAR-VON-MILLER-STR.6 \r\nRADIESCHEN 0,59 \r\nKAESEAUFSCH. 1.39 \r\nBAUCHSPECK 1,19 \r\nBAUCHSPECK 1,19 \r\nDORNFELDER 0,99\r\nCLEMENTINEN 2,49\r\nL&M BLUE 3,50\r\nL&M BLUE 3,50\r\nSUMME EUR 14,84 *\r\nBAR 50,00\r\n\r\nRUCKGELD EUR 35,16\r\n“ENTHALTENE MEHRWERTSTEUER A\r\nMWST NETTO\r\n7,00 % 0,45 6,40\r\n19,00 % 1,28 6,71\r\nSUMME MWST 1,73 13,11\r\nEDEKA HANDELSGFSELLSCHAFT\r\nNORDBAYERN-SACHSEN-THURINGEN MBH\r\nSTEUERNUMMER: 257/115/30471\r\nQUITTUNG\r\nNUTZEN SIE DIF EDECARD\r\nPUNKTE_SAMMELN+PRAMIEN ERWERBEN\r\nTHR EINKAUF WARE UNS\r\n1 BONUSPUNKTE WERT GEWESEN !\r\n08.12.07 16:27 37589 48 4 8500\r\nFS BEDIENTE STE: H. SEUFERT :\r\nVIELEN DANK FÜR IHREN EINKAUF!\r\nAUF WIEDERSEHEN IM E-CENTER\r\nUNSERE ÖFFNUNGSZEITEN FÜR SIE:\r\nMONTAG-SAMSTAG: 0800-20 . 00UER";
var rx = /(?:^|[^.\d])(\d{1,2}[,.]\d{2})(?![.\d])/g;
var m, res = [];
while (m = rx.exec(text)) {
res.push(m[1]);
}
console.log(res);
Upvotes: 1