JadonR
JadonR

Reputation: 193

Regex to get the word after specific match words

I am trying to pull the dollar amount from some invoices. I need the match to be on the word directly after the word "TOTAL". Also, the word total may sometimes appear with a colon after it (ie Total:). An example text sample is shown below:

4 Discover Credit Purchase - c REF#: 02353R TOTAL: 40.00 AID: 1523Q1Q TC: mzQm 40.00 CHANGE 0.00 TOTAL NUMBER OF ITEMS SOLD = 0 12/23/17 Ql:38piii 414 9 76 1G6 THANK YOU FOR SHOPPING KR08ER Now Hiring - Apply Today!

In the case of the sample above, the match should be "40.00".

The Regex statement that I wrote:

(?<=total)([^\n\r]*)

pulls EVERYTHING after the word "total". I only want the very next word.

Upvotes: 3

Views: 3693

Answers (5)

JohnyL
JohnyL

Reputation: 7122

Explanations are in the regex pattern.

string str = "4 Discover Credit Purchase - c REF#: 02353R TOTAL: 40.00 AID: 1523Q1Q";
string pattern = @"(?ix)       # 'i' means case-insensitive search
                    \b         # Word boundary
                    total      # 'TOTAL' or 'total' or any other combination of cases
                    :?         # Matches colon if it exists
                    \s+        # One or more spaces
                    (\d+\.\d+) # Sought number saved into group
                    \s         # One space";
// The number is in the first group: Groups[1]
Console.WriteLine(Regex.Match(str, pattern).Groups[1].Value);

Upvotes: 1

Bohemian
Bohemian

Reputation: 424983

This (unlike other answers so far) matches only the total amount (ie without needing to examine groups):

((?<=\bTOTAL\b )|(?<=\bTOTAL\b: ))[\d.]+

See live demo matching when input has, and doesn’t have, the colon after TOTAL.

The reason 2 look behinds (which don’t capture input) are needed is they can’t have variable length. The optional colon is handled by using an alternation (a regex OR via ...|...) of 2 look behinds, one with and one without the colon.

If TOTAL can be in any case, add (?i) (the ignore case flag) to the start of the regex.

Upvotes: 3

The fourth bird
The fourth bird

Reputation: 163207

What you could do is match total followed by an optional colon :? and zero or more times a whitespace character \s* and capture in a group one or more digits followed by an optional part that matches a dot and one or more digits.

To match an upper or lowercase variant of total you could make the match case insensitive by for example by adding a modifier (?i) or use a case insensitive flag.

\btotal:?\s*(\d+(?:\.\d+)?)

The value 40.00 will be in group 1.

Upvotes: 1

Michał Turczyn
Michał Turczyn

Reputation: 37337

Try this pattern: TOTAL:? ?(\d+.\d+)[^\d]?.

Demo

Upvotes: 0

Sandeep Chauhan
Sandeep Chauhan

Reputation: 171

you can use below regex to get amount after TOTAL:

\bTOTAL\b:?\s*([\d.]+)

It will capture the amount in first group.

Link : https://regex101.com/r/tzze8J/1/

Upvotes: 0

Related Questions