Daniel Minnaar
Daniel Minnaar

Reputation: 6295

Extracting words from lines that match different patterns

I'm monitoring incoming e-mail subjects, and each subject may contain a particularly formatted code inside it which I used to reference something else with down the line.

These codes can be anywhere within the string, and sometimes not at all - and so the problem I'm having is my lack of RegEx skills (which I assume is the best option for this solution?).

An example of a subject would be:

"Please refer to reference MZ5051CLA"
or
"Attention for Mr Danshi, RE. 11123MTX"

The codes I'm looking to extract in these scenarios are "MZ5051CLA" and "11123MTX".

The format of MZ5051CLA will be:
  - Always starts with "MZ"
  - Follows by a number
  - Always ends with "CLA"

Is there a simple way to evaluate the subject as a whole and extract any words that match the codes only?

I've looked at various solutions to my problem here on SO, but they're either overly complicated or I can't quite relate.

Edit:

As ShashishChandra pointed out, the idea is to monitor multiple mailboxes, each with their own code formats. So my idea was to implement a regex setting for each mailbox.

Perhaps this was important to mention initially, since a solution to catch all formats in one regex won't work. Apologies for that.

Upvotes: 1

Views: 96

Answers (4)

Shashish Chandra
Shashish Chandra

Reputation: 499

So, in that case if you don't mind false positives, then use: /^(?=.*[0-9])(?=.*[A-Z])([A-Z0-9]+)$/. This will work well in general.

Upvotes: 0

zx81
zx81

Reputation: 41838

Both Codes in One Pattern

It seems that the codes must include at least one uppercase letter and at least one digit. For that kind of pattern, a password-validation technique is commonly used, and I would suggest:

\b(?=[A-Z0-9]*[A-Z])[A-Z0-9]*[0-9][A-Z0-9]*

In the demo, see how only the correct groups are matched. Of course false positives are possible.

Reference

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174706

The below regex would match only the first string MZ5051CLA

\bMZ\d+CLA\b 

DEMO

But this would match the both strings MZ5051CLA and 11123MTX,

\b[A-Z0-9]+$

All alphanumeric characters present at the last of a line are matched.

DEMO

This would get you the Alphanumeric string which starts with MZ and ends with CLA or starts with a number and ends with mtx

(?:\b[A-Z0-9]+$|\b\d+MTX\b)

DEMO

Upvotes: 1

Stephan
Stephan

Reputation: 43013

Try this regex:

^.*(?:(MZ\d+CLA)|RE\.\s+(\d+MTX))$

Regular expression visualization

Demo

Upvotes: 2

Related Questions