user1768884
user1768884

Reputation: 1037

Why doesn't this Regex match in Perl?

I have a string that can read something like this (although not always, numbers can vary).

Board Length,45,inches,color,board height,8,inches,black,store,wal-mart,Board weight,20,dollars

I am trying to match the 45 that follows the Board Length this regex expression.

if ($string =~/Board Length,(\d+\.\d+)/){

    print $string;

}

Is the formatting wrong? I thought d+ would match as many numbers as needed, . would match a literal '.', and d+ would match any numbers after the decimal (if there are any).

Upvotes: 1

Views: 74

Answers (4)

David W.
David W.

Reputation: 107030

You have the following expression:

$string =~/Board Length,(\d+\.\d+) /

Your string is this:

Board Length,45,inches

The string Board Length will match the pattern Board Length,. However, the rest of our pattern is matching one or more digits followed by a period follows by one or more digits. This doesn't match the string 45. There's no decimal there.

The question is what are you trying to match. For example, if the number is surrounded by commas, you could do this:

$string =~ /Board Length,([^,]+),/;
my $number = $1;

The [^,] means Not a comma. You're capturing everything after a comma to the next comma. This will allow you to capture 45, 45.32, and even 4.5e+10. Just anything between the two commas.

Note that you use $1 for your first capture group and not $_.

Another way is to use non-greedy matching:

$string =~ /Board Length,(.+?),/;
my $number = $1;

What happens if what is captured isn't a number? You can test for that using the looks_like_number function from Scalar::Util (which has been included in Perl distributions for a long time).:

use Scalar::Util    qw(looks_like_number);

my $string = "Board Length,Extra long,feet,...";
...
$string =~ /Board Length,(.+?),/;
my $number = $1;

if ( looks_like_number( $number ) ) {
    print "$number is a number\n";
}
else {
    print "Nope. $number isn't a number\n";
}

Upvotes: 0

Andy Lester
Andy Lester

Reputation: 93636

You are not printing what you capture. You're printing $_ which we don't know what it is.

if ($string =~/Board Length,(\d+\.\d+)/){
    print $_;
}

What I think you want is:

if ($string =~/Board Length,(\d+\.\d+)/){
    print $1;
}

Upvotes: 1

Brian Mego
Brian Mego

Reputation: 1469

You are absolutely right about what that should match. However, without the '?' character, you are specifying that all of those pieces must be present.

\d+\.\d+

This means "1 or more numbers, period, 1 or more numbers"

1.5, 253333.7, 0.0 would all be matched. However, your example uses 45, which has no "." in it, nor numbers afterward. There are a few solutions to your problem, the most full proof of which was stated above by mpapac. Allow the decimal and following digits to be optional.

(\.\d+)?

The problem with this as such is that putting a () around it makes it another capture group. You may or may not want this. Putting the ?: inside it means "Use this as a group, but don't capture it." Hence:

(?:\.\d+)?

The other option is not to do the grouping, and instead make both the decimal itself optional and the digits after the decimal ZERO or more instead of ONE or more. That would look something like this:

\d+\.?\d*

Upvotes: 2

mpapec
mpapec

Reputation: 50637

As you have put it, decimal . and following digits are mandatory. Thus (\.\d+)? to make it optional,

if ($string =~/Board Length,(\d+(?:\.\d+)?)/)

Upvotes: 2

Related Questions