Reputation: 144
I have some text that I converted from a PDF file, and I now need to take specific contents from the text using regex. In the past I used indexes, and math to get a specific length
This is my text:
1ZW6897X0327621544
Each one will start with 1Z
and be 18 characters long.
I have tried going to Regexr.com to help but it just does not make any sense at all:
1Z[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
This is how my brain is processing what I am reading, 1Z
is the start and then any character 0-9
for the next 16 places?
Can some one please help.
Upvotes: 2
Views: 90
Reputation: 626845
You may use
\b1Z[A-Z0-9]{16}\b
Or
\b1Z\w{16}\b
See the regex demo
Details
\b
- a word boundary1Z
- a literal substring[A-Z0-9]{16}
- 16 uppercase ASCII letters and/or digits (note that \w
will match any letters, digits, and/or _
and if you do not pass RegexOptions.ECMAScript
, it will match all Unicode letters/digits, and some more "funny" symbols)\b
- a word boundary.If the boundaries are whitespace (i.e. the matches are expected to be preceded with start of string or whitespace and followed with the end of string or whitespace), you may use (?<!\S)1Z[A-Z0-9]{16}(?!\S)
pattern instead.
In C#, you may use it with Regex.Matches
:
var results = Regex.Matches(s, @"\b1Z[A-Z0-9]{16}\b")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Upvotes: 4