Mani Kanth
Mani Kanth

Reputation: 123

Regex - orderless extraction of string

I have 2 strings which are 2 records

string1 = "abc/BS-QANTAS\\/DS-12JUL15\\dfd"
string2 = "/DS-10JUN15\\/BS-AIRFRANCE\\dfdsfsdf"

BS is booking airline DS is Date

I want to use a single regex and extract the booking source & date. Please let me know if it is feasible. I have tried lookaheads and still couldn't achieve

The target language is Splunk and not Javascript. Whatever may be the language please post I'll give a try in Splunk

Upvotes: 1

Views: 716

Answers (3)

Alan Moore
Alan Moore

Reputation: 75222

Here's a more scalable (and more readable, IMO) alternative to miroxlav's answer:

(?:\/BS-(?P<source>\w+)|\/DS-(?P<date>\w+)|[^\/\v]+)+

I'm assuming the fields you're interested in always start with a slash. That allows me to use [^/]+ to safely consume the junk between/around them.

demo

This is effectively three regexes in one, wrapped in a group, to give each one a chance to match in turn, and applied multiple times. If the first alternative matches, you're looking at a "source airline" field, and the name is captured in the group named "source". If then second alternative matches, you're looking at the date, which is captured in the "date" group.

But, because the fields aren't in a predetermined order, the regex has to match the whole string to be sure of matching both fields (in fact, I should have used start and end anchors--^ and $--to enforce that; I've added them below). The third alternative, [^/]+, allows it to consume the parts that the first two can't, thus making an overall match possible. Here's the updated regex:

^(?:\/BS-(?P<source>\w+)|\/DS-(?P<date>\w+)|[^\/\v]+)+$

...and the updated demo. As noted in the comment, the \v is there only because I'm combining your two examples into one multiline string and doing two matches. You shouldn't need it in real life.

Upvotes: 1

miroxlav
miroxlav

Reputation: 12194

This gives you both strings filled either in match groups airline1+date1 or in airline2+date2:

((BS-(?<airline1>\w+).*DS-(?<date1>[\w]+))|(DS-(?<date2>[\w]+).*BS-(?<airline2>\w+)))

>> view at regex101.com

Since there are only 2 groups, I used simple permutation.

This regex will take last of occurrences, if there are more. If you need earliest one (using lookbehind), let me know.

Upvotes: 0

Shar1er80
Shar1er80

Reputation: 9041

You mentioned that you've tried lookahead, what about lookbehind?

(?<=BS-|DS-)(\w+)

Tested at Regex101

Upvotes: 1

Related Questions