SuS
SuS

Reputation: 71

Regex to find numbers from String with different format

I've got the following text:

instance=hostname1, topic="AB_CD_EF_12345_ZY_XW_001_000001"
instance=hostname2, topic="AB_CD_EF_1345_ZY_XW_001_00001"
instance=hostname1, topic="AB_CD_EF_1235_ZY_XW_001_000001"
instance=hostname2, topic="AB_CD_EF_GH_4567_ZY_XW_01_000001"
instance=hostname1, topic="AB_CD_EF_35678_ZY_XW_001_00001"
instance=hostname2, topic="AB_CD_EF_56789_ZY_XW_001_000001"

I would like to capture numbers from the sample above. I've tried to do so with the regular expressions below and they work well as separate queries:

Regex: *.topic="AB_CD_EF_([^_]+).*    
Matches: 12345 1345 1235

Regex: *.topic="AB_CD_EF_GH_([^_]+).*
Matches: 4567 35678 56789

But I need a regex which can give me all numbers, ie:

12345 1345 1235 4567 35678 56789

Upvotes: 1

Views: 2100

Answers (4)

SuS
SuS

Reputation: 71

The regex worked for me :

/.*topic="(?:[AB_CD_EF_(GH_)]{2,3}_)+([^_]]+).*/

Upvotes: 0

Emma
Emma

Reputation: 27723

Another option that we might call, would be an expression similar to:

topic=".*?[A-Z]_([0-9]+)_.*?"

and our desired digits are in this capturing group ([0-9]+).

Please see the demo for additional explanation.

Upvotes: 1

Bohemian
Bohemian

Reputation: 425063

Make GH_ optional:

.*topic="AB_CD_EF_(GH_)?([^_]+).*

which matches all your target numbers.

See live demo.


You could be more general by allowing any number of "letter letter underscore" sequences using:

.*topic="(?:[A-Z]{2}_)+([^_]+).*

See live demo.

Upvotes: 2

Matthew
Matthew

Reputation: 1943

From the examples and conditions you've given I think you're going to need a very restrictive regex, but this may depend on how you want to adapt it. Take a look at the following regex and read the breakdown for more information on what it does. Use the first group (there is only one in this regex) as a substitution to retrieve the numbers you are looking for.

Regex

^instance\=hostname[0-9]+\,\s*topic\=\“[A-Z_]+([0-9]+)_[A-Z_]+[0-9_]+\”$

Try it out in this DEMO.

Breakdown

^                # Asserts position at start of the line
hostname[0-9]+   # Matches any and all hostname numbers
\s*              # Matches whitespace characters (between 0 and unlimited times)
[A-Z_]+          # Matches any upper-case letter or underscore (between 1 and unlimited times)
([0-9]+)         # This captures the number you want
$                # Asserts position at end of the line

Although this does answer the question you have asked I fear this might not be exactly what you're looking for but without further information this is the best I can give you. In any case after you've studied the breakdown and played around the demo a it it should prove to be of some help.

Upvotes: 0

Related Questions