JL Delos Reyes
JL Delos Reyes

Reputation: 47

How to substring in RegExp?

Does anyone how to know to substring in regular expression? I am currently profiling data and i saw different format such as :

EB0000000

EB00000000PHL00000000F00000000

P0000000A

When I used my expression: \b(?:[A-Z]{1}\d{7}[A-Z]{1}|[A-Z]{1}\d{7,8}|[A-Z]{2}\d{6}|[A-Z]{2}\d{7,8})\b

I captured the first and last sample, but the second looks improper data but i still want to capture EB and those 8 digits before PHL. Is it possible in regexp? TIA

Upvotes: 1

Views: 85

Answers (2)

The fourth bird
The fourth bird

Reputation: 163362

It is possible, but you could change the order of the alternatives to put the most specific one at the beginning and then remove the word boundary at the end.

Note that you can omit {1}

\b(?:[A-Z]{2}\d{7,8}|[A-Z]\d{7}[A-Z]|[A-Z]\d{7,8}|[A-Z]{2}\d{6})

In parts

  • \b Word boundary
  • (?: Non capture group
    • [A-Z]{2}\d{7,8} Match 2 times A-Z and 7-8 digits
    • | Or
    • [A-Z]\d{7}[A-Z] Match A-Z, 7 digits and A-Z
    • | Or
    • [A-Z]\d{7,8} Match A-Z and 7-8 digits
    • | Or
    • [A-Z]{2}\d{6} Match 2 times A-Z and 6 digits
  • ) Close group

Regex demo

Upvotes: 1

Blissful
Blissful

Reputation: 36

Why is it so hard to write? Maybe there are some lines nearby that should not fall into the selection?

\b[A-Z\d]{8,}\b

Upvotes: 2

Related Questions