Reputation: 31
I have a text file with pattern as below.
"s|o|m|j|n|k|v|a|l|u|e|s|cap1{capture|these|values}|s|o|m|j|n|k|v|a|l|u|e|s|cap2[capture|these|values]|s|o|m|j|n|k|v|a|l|u|e|s|CAP3{[capture|these|values]|[capture|these|values]}"
I am trying to extract the values cap1, cap2, CAP3.
I am trying with regex "([a-z]|[|])cap1(\{(.*?)\})([a-z]|[|]|[0-9])"
but with no luck any help is appreciated.
Upvotes: 0
Views: 155
Reputation: 4139
As I understand you want to extract the value of cap1, cap2, CAP3 one by one. There are 3 regex then
For cap1
cap1\{([^\}]*)\}
Explanation
cap1\{
match text cap1{
,
([^\}]*)
capture any characters except }
to group $1
,
\}
match text }
.
For cap2
cap2\[([^\]]*)\]
Explanation
cap2\[
match text cap2[
,
([^\]]*)
capture any characters except ]
to group $1
,
\]
match text ]
.
For CAP3
CAP3\{\[([^\]]*)\]\|\[([^\]]*)\]\}
Explanation
CAP3\{
match text CAP3{
,
\[([^\]]*)\]\|\[([^\]]*)\]
capture any characters except ]
to groups $1
, $2
respectively,
\}
match text }
.
Additional: Thank you for a comment from @Borodin, to do this task you don't need to use lookaround but in case that you want to do search and replace, the lookaround may be necessary.
For cap1
: (?<=cap1\{)([^\}]*)(?=\})
For cap2
: (?<=cap2\[)([^\]]*)(?=\])
For CAP3
: (?<=CAP3\{)\[([^\]]*)\]\|\[([^\]]*)\](?=\})
Upvotes: 1
Reputation: 126762
I apologise -- I initially mistook your question for something more trivial
Essentially you want to perform a split
on pipe |
characters, excluding those found inside pairs of brackets or braces [ ... ]
or { ... }
. As long as you don't need to take account of nesting inside brackets of the same type (i.e. braces will only ever contain brackets, and brackets will only ever contain braces) then it is simply done like this
my @matches = $s =~ m{ \w+ ( \{ [^{}]* \} | \[ [^\[\]]* \] ) }gx;
print "$_\n" for @matches;
{capture|these|values}
[capture|these|values]
{[capture|these|values]|[capture|these|values]}
The data you show has no instances of braces containing braces, or brackets containing brackets, but I suspect that there is no theoretical limit to the nesting of the your data in which case some recursion is necessary
The regex pattern in the program below defines the text that can appear inside a pair of matching brackets as a pipe-delimited sequence of
[ ... ]
{ ... }
capture
and values
A pattern matching that is inside the second pair of capturing parentheses. It is a recursive pattern that calls itself using relative numbering (?-1)
. That could also be absolute numbering (?2)
but it would have to be changed if the number of preceding captures was changed
The complete pattern looks for and captures a series of word characters immediately before the recursive pattern to account for the cap1
, cap2
etc. This allows the result of a glolbal search to be assigned directly to a hash with the result show below
use strict;
use warnings;
my $s = "s|o|m|j|n|k|v|a|l|u|e|s|cap1{capture|these|values}|s|o|m|j|n|k|v|a|l|u|e|s|cap2[capture|these|values]|s|o|m|j|n|k|v|a|l|u|e|s|CAP3{[capture|these|values]|[capture|these|values]}";
my %captures = $s =~ m{
( (?> \w+ ) )
(
\{ (?-1) (?> \| (?-1) )* \} |
\[ (?-1) (?> \| (?-1) )* \] |
\w+
)
}gx;
use Data::Dump;
dd \%captures;
{
cap1 => "{capture|these|values}",
cap2 => "[capture|these|values]",
CAP3 => "{[capture|these|values]|[capture|these|values]}",
}
It looks like you want all identifiers that are preceded by a pipe |
character and followed by either a square or curly opening bracket [
or {
This program will do that for you
use strict;
use warnings;
use v5.10;
my $s = "s|o|m|j|n|k|v|a|l|u|e|s|cap1{capture|these|values}|s|o|m|j|n|k|v|a|l|u|e|s|cap2[capture|these|values]|s|o|m|j|n|k|v|a|l|u|e|s|CAP3{[capture|these|values]|[capture|these|values]}";
for ( $s ) {
my @captures = /\|(\w+)[\[\{]/g;
say for @captures;
}
cap1
cap2
CAP3
Upvotes: 0
Reputation: 47282
Using a pattern such as this should work:
[{\[]+([^}{\]\[]+)[\]}]+
Code:
$searchText =~ m/[{\[]+([^}{\]\[]+)[\]}]+/
Example:
https://regex101.com/r/qI3fI6/1
Upvotes: 0