Reputation: 53
I have tried to create regex for the below:
STRING sou_u02_mlpv0747_CCF_ASB001_LU_FW_ALERT|/opt/app/medvhs/mvs/applications/cm_vm5/fwhome/UnifiedLogging|UL_\d{8}_CCF_ASB001_LU_sou_u02_mlpv0747_Primary.log.csv|FATAL|red|1h||fw_alert
REGEX----> /^[^#]\w+\|[^\|]+\|\w+\|\w+\|\w*\|\w*\|([^\|]+|)\|\w*$/
I am unable to figure out the mistake here.
I created the above by referring another regex which working fine and given below
/^[^#]\w+\|[^\|]+\|([^\|]+|)\|[rm]\|(in|out|old|new|arch|missing)\|\w+\|([0-9-,]+|)\|\w*\|\w*$/
sou_u02_mlpv0747_CCF_ASB001_LU_ODR|/opt/app/medvhs/mvs/applications/cm_vm5/components/CCF_ASB001_LU/SPOOL/ODR||r|out|30m|0400-1959|30m|gprs_in_stag
can some one please help me. Any leads would be highly apprciated.
Upvotes: 0
Views: 74
Reputation: 30971
Let's start from a brief look at your source text (the first that you included).
It is composed of "sections" separated with |
char.
This char (|
) must be matched by \|
. Remember about the preceding
backslash, otherwise, a "bare" |
would mean the alternative separator
(you used it in one place).
And now take a look at each section (between |
):
\w+
).[^|]+
(here,
between [
and ]
, the vertical bar may be unescaped).Now let's write each section and its "type":
sou_u02_..._FW_ALERT
- word chars./opt/app/.../UnifiedLogging
- other chars (because of slashes).UL_\d{8}_..._Primary.log.csv
- other chars (because of \d{8}
and dots).FATAL|red|1h
- 3 sections composed of word chars.|
chars.fw_alert
- word chars.And now, how to match these groups, and the separating |
:
\w+\|
- word chars and (escaped) vertical bar.(?:[^|]+\|){2}
- a non-capturing
group - (?:...)
, containing a sequence of "other" chars - [^|]+
and a vertical bar - \|
, occurring 2 times {2}
.(?:\w+\|){3}
- similiar to
the previous point.([^|]+|)\|
, a capturing group -
(...)
, with 2 alternatives ...|...
. The first alternative is
[^|]+
(a sequence of "other" chars), and the second alternative
is empty. After the capturing group there is \|
to match the vertical
bar.\w+
- word chars. This time no \|
, as this is the last
section.The regex assembled so far must be:
^
(start of string) and$
(end of string).So the whole regex, matching your source text can be:
^\w+\|(?:[^|]+\|){2}(?:\w+\|){3}([^|]+|)\|\w+$
Actually, the only capturing group can be written another way,
as ([^|]*)
- without alternatives, but with *
as the
repetition count, allowing also empty content.
Your choice, which variant to apply.
Upvotes: 2
Reputation: 126722
The third field
UL_\d{8}_CCF_ASB001_LU_sou_u02_mlpv0747_Primary.log.csv
Contains a backslash, \
, braces {
}
and dots .
. None of these can be matched by \w
Note also that there is no need to escape a pipe |
inside a characters class: [^|]+
is fine
Upvotes: 0