Tidra
Tidra

Reputation: 95

Match up to the comma - Regex

I have created a Regex Pattern (?<=[TCC|TCC_BHPB]\s\d{3,4})[-_\s]\d{1,2}[,] This Pattern match just:

TCC 6005_5,

What should I change to the end to match these both strings:

TCC 6005-5 ,
TCC 6005_5,

Upvotes: 0

Views: 103

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

This part of the pattern [TCC|TCC_BHPB] is a character class that matches one of the listed characters. It might also be written for example as [|_TCBHP]

To "match" both strings, you can match all parts instead of using a positive lookbehind.

\bTCC(?:_BHPB)?\s\d{3,4}[-_\s]\d{1,2}\s?,

See a regex demo

  • \bTCC A word boundary to prevent a partial match, then match TCC
  • (?:_BHPB)?\s\d{3,4} Optionally match _BHPB, match a whitespace char and 3-4 digits (Use [0-9] to match a digit 0-9)
  • [-_\s]\d{1,2} Match one of - _ or a whitespace char
  • \s?, Match an optional space and ,

Note that \s can also match a newline.


Using the lookbehind:

(?<=TCC(?:_BHPB)?\s\d{3,4})[-_\s]\d{1,2}\s?,

Regex demo

Or if you want to match 1 or more spaces except a newline

\bTCC(?:_BHPB)?[\p{Zs}\t][0-9]{3,4}[-_\p{Zs}\t][0-9]{1,2}[\p{Zs}\t]*,

Regex demo

Upvotes: 0

ProgrammingLlama
ProgrammingLlama

Reputation: 38727

You can add a non-greedy wildcard to your expression (.*?):

(?<=(?:TCC|TCC_BHPB)\s\d{3,4})[-_\s]\d{1,2}.*?[,]
                                           ^^^

This will now also match any characters between the last digit and the comma.

As has been pointed out in the comments, [TCC|TCC_BHPB] is a character class rather than a literal match, so I've changed this to (?:TCC|TCC_BHPB) which is presumably what your intention was.

Try it online

Upvotes: 1

Related Questions