Reputation: 716
I'm not sure it's possible but I've been trying to extract a piece of info from a non-standard log format. The log could look as follows
field1:val1:extra1
val2:extra1
val3
So what I'm saying is val will always exist but field and extra won't. I'm trying to come up with a regex that will always extract val regardless of the existence of field and extra.
The closest I've got is;
/[^:]*:?([a-zA-Z0-9\-]*[^:]):?/mg
But it's still not quite right. Just to note val will not be a fixed length. Neither will field nor extra.
Upvotes: 0
Views: 48
Reputation: 626802
I suggest you just use a branch reset group with all three possible alternatives capturing just the part you need into Group 1:
^(?|[^:]+:([^:]+):[^:]+|([^:]+):[^:]+|([^:]+))$
See the regex demo
Details:
^
- start of string(?|
- branch reset group start (inner capturing group IDs will start from the same number)
[^:]+:([^:]+):[^:]+
- 3 colon-separated parts, the mid one is captured into Group 1|
- or([^:]+):[^:]+
- 2 colon-separated parts, the first one is captured into Group 1|
- or([^:]+) - Group 1 containing just 1 chunk of 1 or more chars other than
:`)
- end of branch reset$
- end of string.Also, in case you need to get all the parts into the same capture group, I can suggest adding the (?J)
PCRE_INFO_JCHANGED flag and use named capture groups:
(?J)^(?:(?<Field>[^:]+):(?<Val>[^:]+):(?<Extra>[^:]+)|(?<Val>[^:]+):(?<Extra>[^:]+)|(?<Val>[^:]+))$
Upvotes: 1
Reputation: 338208
(?:([^:\r\n]+):(?=.*:))?([^:\r\n]+)(?::(.+))?
will match field
always into group 1, val
always into group 2, and extra
always into group 3. In case field
or extra
do not exist, their groups will be empty.
Breakdown:
(?: # begin non-capturing group
([^:\r\n]+) # group 1: any character except ":" or line-break, repeat
: # a ":"
(?=.*:) # must be followed by another ":" somewhere in the remaining string
)? # end group, make optional
( # group 2
[^:\r\n]+ # any character except ":" or line-break, repeat
) # end group 2
(?: # begin non-capturing group
: # a ":"
(.+) # group 3: rest of the line
)? # end group, make optional
The look-ahead (?=.*:)
is the crucial part. It prevents the engine from matching val
into group 1 in the "val:extra"
case.
Note that if the group values can be empty, like so:
field1:val1:extra1
:val2:extra1
:val3:
then simply change the +
to *
in the capturing groups.
Upvotes: 4