Bluestreak22
Bluestreak22

Reputation: 144

Regex pattern for simple text

I have some text that I converted from a PDF file, and I now need to take specific contents from the text using regex. In the past I used indexes, and math to get a specific length

This is my text:

1ZW6897X0327621544

Each one will start with 1Z and be 18 characters long.

I have tried going to Regexr.com to help but it just does not make any sense at all:

1Z[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]

This is how my brain is processing what I am reading, 1Z is the start and then any character 0-9 for the next 16 places?

Can some one please help.

Upvotes: 2

Views: 90

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use

\b1Z[A-Z0-9]{16}\b

Or

\b1Z\w{16}\b

See the regex demo

Details

  • \b - a word boundary
  • 1Z - a literal substring
  • [A-Z0-9]{16} - 16 uppercase ASCII letters and/or digits (note that \w will match any letters, digits, and/or _ and if you do not pass RegexOptions.ECMAScript, it will match all Unicode letters/digits, and some more "funny" symbols)
  • \b - a word boundary.

If the boundaries are whitespace (i.e. the matches are expected to be preceded with start of string or whitespace and followed with the end of string or whitespace), you may use (?<!\S)1Z[A-Z0-9]{16}(?!\S) pattern instead.

In C#, you may use it with Regex.Matches:

var results = Regex.Matches(s, @"\b1Z[A-Z0-9]{16}\b")
        .Cast<Match>()
        .Select(m => m.Value)
        .ToList();

Upvotes: 4

Related Questions