user4911736
user4911736

Reputation: 31

extract a part of string using regex

I have a text file with pattern as below.

"s|o|m|j|n|k|v|a|l|u|e|s|cap1{capture|these|values}|s|o|m|j|n|k|v|a|l|u|e|s|cap2[capture|these|values]|s|o|m|j|n|k|v|a|l|u|e|s|CAP3{[capture|these|values]|[capture|these|values]}"

I am trying to extract the values cap1, cap2, CAP3.

I am trying with regex "([a-z]|[|])cap1(\{(.*?)\})([a-z]|[|]|[0-9])" but with no luck any help is appreciated.

Upvotes: 0

Views: 155

Answers (3)

fronthem
fronthem

Reputation: 4139

As I understand you want to extract the value of cap1, cap2, CAP3 one by one. There are 3 regex then

For cap1

cap1\{([^\}]*)\}

Explanation

cap1\{ match text cap1{,

([^\}]*) capture any characters except } to group $1,

\} match text }.

For cap2

cap2\[([^\]]*)\]

Explanation

cap2\[ match text cap2[,

([^\]]*) capture any characters except ] to group $1,

\] match text ].

For CAP3

CAP3\{\[([^\]]*)\]\|\[([^\]]*)\]\}

Explanation

CAP3\{ match text CAP3{,

\[([^\]]*)\]\|\[([^\]]*)\] capture any characters except ] to groups $1, $2 respectively,

\} match text }.

Additional: Thank you for a comment from @Borodin, to do this task you don't need to use lookaround but in case that you want to do search and replace, the lookaround may be necessary.

For cap1: (?<=cap1\{)([^\}]*)(?=\})

For cap2: (?<=cap2\[)([^\]]*)(?=\])

For CAP3: (?<=CAP3\{)\[([^\]]*)\]\|\[([^\]]*)\](?=\})

Upvotes: 1

Borodin
Borodin

Reputation: 126762

Update

I apologise -- I initially mistook your question for something more trivial

Essentially you want to perform a split on pipe | characters, excluding those found inside pairs of brackets or braces [ ... ] or { ... }. As long as you don't need to take account of nesting inside brackets of the same type (i.e. braces will only ever contain brackets, and brackets will only ever contain braces) then it is simply done like this

my @matches = $s =~ m{ \w+ ( \{ [^{}]* \} | \[ [^\[\]]* \] ) }gx;
print "$_\n" for @matches;

output

{capture|these|values}
[capture|these|values]
{[capture|these|values]|[capture|these|values]}

The data you show has no instances of braces containing braces, or brackets containing brackets, but I suspect that there is no theoretical limit to the nesting of the your data in which case some recursion is necessary

The regex pattern in the program below defines the text that can appear inside a pair of matching brackets as a pipe-delimited sequence of

  • another pair of matching brackets and their content [ ... ]
  • another pair of matching braces and their content { ... }
  • a sequence of word characters like capture and values

A pattern matching that is inside the second pair of capturing parentheses. It is a recursive pattern that calls itself using relative numbering (?-1). That could also be absolute numbering (?2) but it would have to be changed if the number of preceding captures was changed

The complete pattern looks for and captures a series of word characters immediately before the recursive pattern to account for the cap1, cap2 etc. This allows the result of a glolbal search to be assigned directly to a hash with the result show below

use strict;
use warnings;

my $s = "s|o|m|j|n|k|v|a|l|u|e|s|cap1{capture|these|values}|s|o|m|j|n|k|v|a|l|u|e|s|cap2[capture|these|values]|s|o|m|j|n|k|v|a|l|u|e|s|CAP3{[capture|these|values]|[capture|these|values]}";

my %captures = $s =~ m{
    ( (?> \w+ ) )
    (
        \{ (?-1) (?> \| (?-1) )* \} |
        \[ (?-1) (?> \| (?-1) )* \] |
        \w+
    )
}gx;

use Data::Dump;
dd \%captures;

output

{
  cap1 => "{capture|these|values}",
  cap2 => "[capture|these|values]",
  CAP3 => "{[capture|these|values]|[capture|these|values]}",
}



Original answer

It looks like you want all identifiers that are preceded by a pipe | character and followed by either a square or curly opening bracket [ or {

This program will do that for you

use strict;
use warnings;
use v5.10;

my $s = "s|o|m|j|n|k|v|a|l|u|e|s|cap1{capture|these|values}|s|o|m|j|n|k|v|a|l|u|e|s|cap2[capture|these|values]|s|o|m|j|n|k|v|a|l|u|e|s|CAP3{[capture|these|values]|[capture|these|values]}";

for ( $s ) {
    my @captures = /\|(\w+)[\[\{]/g;
    say for @captures;
}

output

cap1
cap2
CAP3

Upvotes: 0

l&#39;L&#39;l
l&#39;L&#39;l

Reputation: 47282

Using a pattern such as this should work:

[{\[]+([^}{\]\[]+)[\]}]+

Code:

$searchText =~ m/[{\[]+([^}{\]\[]+)[\]}]+/

Example:

https://regex101.com/r/qI3fI6/1

Upvotes: 0

Related Questions