Reputation: 106

Regex Word Boundary in Perl not yield expected results

So I'm having an issue with pulling data from a string between 2 keywords. I understand that in regex I'm suppose to use the \b boundary tags and I've written the following for a test example, however it seems to only match the whole string instead of just the portion I want.

For example, the string: "here are more string words START OF INFORMATION SECTION some other stuff"

I am gathering text between "START" and "SECTION".

So I'm expecting "START OF INFORMATION SECTION", I believe.

This is the following snippet I have written in Perl specifically, but it doesn't yield the results I expected.

#!/usr/bin/perl

# This is perl 5, version 22, subversion 1 (v5.22.1) built for cygwin-thread-multi
use POSIX;

my $text = "here are more string words START OF INFORMATION SECTION some other stuff";

print "Original String: $text\n";

# this should provide me with the specific text between my two boundary words
$text =~ /\bSTART\b(.*?)\bSECTION\b/;

print "New String: $text\n";

Upvotes: 0

Answers (3)

Borodin

Reputation: 126772

Your code is simply testing whether the regex pattern matches the string, returning a true or false value to indicate whether there was a match. You discard that indicator

If there was a match then the strings captured using parentheses in the regex pattern will be assigned to the capture variables $1, $2 etc.

It's unclear what you need to do, but this program prints everything between START and SECTION: in this case OF INFORMATION

There's no need for use POSIX, but use strict and use warnings 'all' are essential

#!/usr/bin/perl

use strict;
use warnings 'all';

my $text = "here are more string words START OF INFORMATION SECTION some other stuff";

print "Original String: $text\n";

if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
    my $section = $1;
    print "New String:      $section\n";
}

output

Original String: here are more string words START OF INFORMATION SECTION some other stuff
New String:       OF INFORMATION

Upvotes: 1

ikegami

Reputation: 386696

The match operator doesn't change the string it matches.

You can use either of the following to inspect the captured string:

if ( $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
    my $section = $1;
    print "New String: $section\n";
}

if ( my ($section) = $text =~ /\bSTART\b(.*?)\bSECTION\b/ ) {
    print "New String: $section\n";
}

Upvotes: 0

rock321987

Reputation: 11042

You should use this

$text =~ /\b(START\b(.*?)\bSECTION)\b/;
print "New String: $1\n";

IDEONE DEMO

$1 is the first captured group.

As suggested by borodin

if ( $text =~ /\b(START\b(.*?)\bSECTION)\b/ ) {
    my $tmp = $1;
    print "New String:      $tmp\n";
}

Upvotes: 0

Regex Word Boundary in Perl not yield expected results

Answers (3)

output

Related Questions