Reputation: 305
I know this has been asked a thousand times before, but I could not get any of the previous solutions working for my case. I'm trying to use Regex in Javascript to parse a text file. The bit I'm trying to extract is the monetary figure, with a format like 55,555.00. The numbers of digits here can vary throughout the text file. Additionally, the boundary characters and spaces can vary.
I wrote the following to extract what I need from the sample code below:
/((\w\s{10,20})([0-9]{8,}(?=.*[,.]))/g
sample code:
23205 - Grants Current-County Operatin 4,425,327.00"
" 4 0000047387 Central Equatoria State 1003-1478 Sta Hosp Oper Oct 85,784.00"
" 4 0000047442 EASTERN EQUATORIA ST 1003-1479 Sta Hosp Oper Oct 93,137.00"
" 4 0000047485 JONGLEI STATE 1003-1519 Sta Hosp Oper Oct 144,608.00"
" 4 0000047501 Lakes State 1003-1482 Sta Hosp Oper Oct 93,137.00"
" 4 0000047528 Unity State 1003-1484 Sta Hosp Oper Oct 75,980.00"
" 4 0000047532 Northern Bahr-el State 1003-1483 Sta Hosp Oper Oct 58,824.00"
" 4 0000047615 Western E State 1003-1488 Sta Hosp Oper Oct 93,137.00"
" 4 0000047638 Warap State 1003-1486 Sta Hosp Oper Oct 51,471.00"
" 4 0000047680 Upper Nile State 1003-1485 Sta Hosp Oper Oct 102,941.00"
" 4 0000047703 Western BG State 1003-1487 Sta Hosp Oper Oct 34,314.00"
----------------------
" Total For Period 4 833,333.00"
----------------------------------------------------------------------------------------------------------------------------
Fiscal Year 2015/16 Republic Of South Sudan Date 2015/11/20
Period 5 Time 12:58:40
FreeBalance Financial Management System Page 7
----------------------------------------------------------------------------------------------------------------------------
Vendor Analysis Report
1091 Health (MOH)
Prd Voucher # Vendor Name Description Amount
--- ---------------- ------------------------------ ----------------------------- ----------------------
----------------------
"
Here's an example: https://regex101.com/r/nO8nM1/4
The issue is the leading boundary. I am able to exclude the closing boundary (double quotes), but I can't get rid of the leading boundary. I've gotten a couple things sort of working, but they included the two strings of digits outside the main tables (in this case 4,425,327.00 and 833,333.00).
Any help would be much appreciated.
Upvotes: 1
Views: 588
Reputation: 627082
To match float values with obligatory decimal fractions and ,
as a digit grouping symbol, you can use
\d+(?:,\d{3})*\.\d+
See demo
Explanation:
\d+
- 1 or more digits(?:,\d{3})*
- 0 or more sequences of
,
- a comma\d{3}
- exactly 3 digits\.
- a literal period/dot\d+
- 1 or more digits.To only get the values that appear after Oct
, you may use a regex that is a mix of the pattern above and yours:
\w\s{10,20}(\d+(?:,\d{3})*\.\d+)
See another demo
The \w\s{10,20}
matches an alphanumeric \w
and then 10 to 20 whitespace characters, and only after that the pattern matches and captures into Group 1 the float value.
See JS snippet below (m[1]
is where the float value resides):
var re = /\w\s{10,20}(\d+(?:,\d{3})*\.\d+)/gm;
var str = ' 23205 - Grants Current-County Operatin 4,425,327.00"\n\n" 4 0000047387 Central Equatoria State 1003-1478 Sta Hosp Oper Oct 85,784.00"\n" 4 0000047442 EASTERN EQUATORIA ST 1003-1479 Sta Hosp Oper Oct 93,137.00"\n" 4 0000047485 JONGLEI STATE 1003-1519 Sta Hosp Oper Oct 144,608.00"\n" 4 0000047501 Lakes State 1003-1482 Sta Hosp Oper Oct 93,137.00"\n" 4 0000047528 Unity State 1003-1484 Sta Hosp Oper Oct 75,980.00"\n" 4 0000047532 Northern Bahr-el State 1003-1483 Sta Hosp Oper Oct 58,824.00"\n" 4 0000047615 Western E State 1003-1488 Sta Hosp Oper Oct 93,137.00"\n" 4 0000047638 Warap State 1003-1486 Sta Hosp Oper Oct 51,471.00"\n" 4 0000047680 Upper Nile State 1003-1485 Sta Hosp Oper Oct 102,941.00"\n" 4 0000047703 Western BG State 1003-1487 Sta Hosp Oper Oct 34,314.00"\n ----------------------\n" Total For Period 4 833,333.00"\n ----------------------------------------------------------------------------------------------------------------------------\n Fiscal Year 2015/16 Republic Of South Sudan Date 2015/11/20\n Period 5 Time 12:58:40\n FreeBalance Financial Management System Page 7\n ----------------------------------------------------------------------------------------------------------------------------\n Vendor Analysis Report\n\n 1091 Health (MOH)\n Prd Voucher # Vendor Name Description Amount\n --- ---------------- ------------------------------ ----------------------------- ----------------------\n ----------------------\n" ';
var m;
while ((m = re.exec(str)) !== null) {
document.getElementById("r").innerHTML += m[1] + "<br/>";
}
<div id="r"/>
Upvotes: 2