Joshua Swiss
Joshua Swiss

Reputation: 305

Regex to find string between two strings, excluding outer strings

I know this has been asked a thousand times before, but I could not get any of the previous solutions working for my case. I'm trying to use Regex in Javascript to parse a text file. The bit I'm trying to extract is the monetary figure, with a format like 55,555.00. The numbers of digits here can vary throughout the text file. Additionally, the boundary characters and spaces can vary.

I wrote the following to extract what I need from the sample code below:

/((\w\s{10,20})([0-9]{8,}(?=.*[,.]))/g

sample code:

                  23205        - Grants Current-County Operatin                        4,425,327.00"

"    4   0000047387         Central Equatoria State          1003-1478 Sta Hosp Oper Oct                   85,784.00"
"    4   0000047442         EASTERN EQUATORIA ST             1003-1479 Sta Hosp Oper Oct                   93,137.00"
"    4   0000047485         JONGLEI STATE                    1003-1519 Sta Hosp Oper Oct                  144,608.00"
"    4   0000047501         Lakes State                      1003-1482 Sta Hosp Oper Oct                   93,137.00"
"    4   0000047528         Unity State                      1003-1484 Sta Hosp Oper Oct                   75,980.00"
"    4   0000047532         Northern Bahr-el State           1003-1483 Sta Hosp Oper Oct                   58,824.00"
"    4   0000047615         Western E State                  1003-1488 Sta Hosp Oper Oct                   93,137.00"
"    4   0000047638         Warap State                      1003-1486 Sta Hosp Oper Oct                   51,471.00"
"    4   0000047680         Upper Nile State                 1003-1485 Sta Hosp Oper Oct                  102,941.00"
"    4   0000047703         Western BG State                 1003-1487 Sta Hosp Oper Oct                   34,314.00"
                                                                                             ----------------------
"        Total For Period          4                                                                      833,333.00"
 ----------------------------------------------------------------------------------------------------------------------------
 Fiscal Year        2015/16                               Republic Of South Sudan                         Date     2015/11/20
 Period                   5                                                                               Time       12:58:40
                                                  FreeBalance Financial Management System                 Page              7
 ----------------------------------------------------------------------------------------------------------------------------
                                                            Vendor Analysis Report

                                                              1091 Health (MOH)
  Prd   Voucher #          Vendor Name                      Description                          Amount
  ---   ----------------   ------------------------------   -----------------------------    ----------------------
                                                                                             ----------------------
"  

Here's an example: https://regex101.com/r/nO8nM1/4

The issue is the leading boundary. I am able to exclude the closing boundary (double quotes), but I can't get rid of the leading boundary. I've gotten a couple things sort of working, but they included the two strings of digits outside the main tables (in this case 4,425,327.00 and 833,333.00).

Any help would be much appreciated.

Upvotes: 1

Views: 588

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

To match float values with obligatory decimal fractions and , as a digit grouping symbol, you can use

\d+(?:,\d{3})*\.\d+

See demo

Explanation:

  • \d+ - 1 or more digits
  • (?:,\d{3})* - 0 or more sequences of
    • , - a comma
    • \d{3} - exactly 3 digits
  • \. - a literal period/dot
  • \d+ - 1 or more digits.

To only get the values that appear after Oct, you may use a regex that is a mix of the pattern above and yours:

\w\s{10,20}(\d+(?:,\d{3})*\.\d+)

See another demo

The \w\s{10,20} matches an alphanumeric \w and then 10 to 20 whitespace characters, and only after that the pattern matches and captures into Group 1 the float value.

See JS snippet below (m[1] is where the float value resides):

var re = /\w\s{10,20}(\d+(?:,\d{3})*\.\d+)/gm; 
var str = '                  23205        - Grants Current-County Operatin                        4,425,327.00"\n\n"    4   0000047387         Central Equatoria State          1003-1478 Sta Hosp Oper Oct                   85,784.00"\n"    4   0000047442         EASTERN EQUATORIA ST             1003-1479 Sta Hosp Oper Oct                   93,137.00"\n"    4   0000047485         JONGLEI STATE                    1003-1519 Sta Hosp Oper Oct                  144,608.00"\n"    4   0000047501         Lakes State                      1003-1482 Sta Hosp Oper Oct                   93,137.00"\n"    4   0000047528         Unity State                      1003-1484 Sta Hosp Oper Oct                   75,980.00"\n"    4   0000047532         Northern Bahr-el State           1003-1483 Sta Hosp Oper Oct                   58,824.00"\n"    4   0000047615         Western E State                  1003-1488 Sta Hosp Oper Oct                   93,137.00"\n"    4   0000047638         Warap State                      1003-1486 Sta Hosp Oper Oct                   51,471.00"\n"    4   0000047680         Upper Nile State                 1003-1485 Sta Hosp Oper Oct                  102,941.00"\n"    4   0000047703         Western BG State                 1003-1487 Sta Hosp Oper Oct                   34,314.00"\n                                                                                             ----------------------\n"        Total For Period          4                                                                      833,333.00"\n ----------------------------------------------------------------------------------------------------------------------------\n Fiscal Year        2015/16                               Republic Of South Sudan                         Date     2015/11/20\n Period                   5                                                                               Time       12:58:40\n                                                  FreeBalance Financial Management System                 Page              7\n ----------------------------------------------------------------------------------------------------------------------------\n                                                            Vendor Analysis Report\n\n                                                              1091 Health (MOH)\n  Prd   Voucher #          Vendor Name                      Description                          Amount\n  ---   ----------------   ------------------------------   -----------------------------    ----------------------\n                                                                                             ----------------------\n"  ';
var m;
 
while ((m = re.exec(str)) !== null) {
    document.getElementById("r").innerHTML += m[1] + "<br/>";
}
<div id="r"/>

Upvotes: 2

Related Questions