davis
davis

Reputation: 381

Regex - Extract second position digit from string

I have a regex:

var thisMatch = Regex.Match(result, @"(?-s).+(?=[\r\n]+The information appearing in this document)", RegexOptions.IgnoreCase);

This returns the line before "The information appearing in this document" just fine. The output of my regex is

10 880 $10,000 $800 $25 $10

I need to extract 880, which will always be in second position (the number before 880 could be vary, so \d{0,2} shouldn't be allowed).

How can I grab the second position number?

Upvotes: 0

Views: 103

Answers (2)

Thomas Weller
Thomas Weller

Reputation: 59279

If you insert

\d+\s(\d+)

this will capture a leading number (\d+), separated by a whitespace (\s) from the number you're looking for ((\d+)), captured in a capture group so you can easily access it.

Check the tab Split List in this online demo

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626802

You can use something like

(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)

See the .NET regex demo. In C#:

var output = Regex.Match(result, @"(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)", RegexOptions.Multiline)?.Value;

Or, you could capture the number and grab it from a group with

^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document

See this regex demo. In C#:

var output = Regex.Match(result, @"^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document", RegexOptions.Multiline)?.Groups[1].Value;

Regex details:

  • (?<= - start of a positive lookbehind that requires its pattern to match immediately to the left of the current location:
    • ^ - start of a line (due to the RegexOptions.Multiline)
    • \S+ - one or more non-whitespace chars
    • [\p{Zs}\t]+ - one or more horizontal whitespaces
  • ) - end of the lookbehind
  • \d+ - one or more digits (use \S+ if you are sure this will always be the non-whitespace char streak)
  • (?= - start of a positive lookahead that requires its pattern to match immediately to the right of the current location:
    • .* - the rest of the line (as . does not match an LF char)
    • [\r\n]+ - one or more CR/LF chars
    • The information appearing in this document - literal text
  • ) - end of the lookahead.

Upvotes: 1

Related Questions