EverythingRightPlace
EverythingRightPlace

Reputation: 1197

Matching all the numbers

I asked in another topic about matching numbers like 123. This was too narrow and as I get deeper into Regex I see that you really have to define anything. So I asked for exponential notation and got an answer in this post: /^keyword\s+(-?(?:\d+|\d*\.\d*)(?:[Ee]-?(?:\d+|\d*\.\d*))?)/. I tried to understand this but failed so far.

So I ask more specific now. I need to match numbers, I give some examples here:

13
-999
83.12300
.151
-.213
1e14
124e2
-9e-4

You got it, the regular math stuff.

And to be even more specific I give you my Perl code for this. I am searching for keyword on a line and need to get a value from this line. I'd like to get this value in one Regex because my workaround with the or-statement || seems to cause problems.

my $value;
open(FILE,"data.dat") or die "error on opening data: $!\n";
while (my $line = <FILE>) {
        if (($line =~ /^keyword\s+(-?(?:\d+|\d*\.\d*)(?:[Ee]-?(?:\d+|\d*\.\d*))?)/x) || ($line =~ /^keyword\s*(\d*\.\d*)/)) {
                $value = $1;
        };
}
close(FILE);

Edit

Thx to all for the hints so far.

Upvotes: 1

Views: 711

Answers (4)

EverythingRightPlace
EverythingRightPlace

Reputation: 1197

Thank to your informative postings and the stuff I read the last days, I was able to understand more of the regex-structure. So for this rather simple task I don't want to use additional modules/packages and want to stick to regex. I made some tests and changes to leave out redundancy and adjust to my task. So I won't have several numbers on one line and there can be whitespace on the line. Also, the end of a number is defined by a semicolon. To summarize, I post my final code. Thanks all for the help.

#!/usr/bin/perl

use strict;
use warnings;

my @numbers=(
"keyword 152;",
"keyword 12.23;",
"keyword -2.001;",
"keyword .123;",
"keyword -12.;",
"keyword 55.44.33;",
"keyword 3e14;",
"keyword -3.000e0014;",
"keyword 5e-04;",
"   keyword     5e-04;  ",
"keyword 5e-04  ;",
"keyword .1e2;",
"keyword 9.e3;",
"keyword -0.01E-03;",
"keyword 1.3e-03;",
"keyword 1dd;",
"keyword -12E3e1;",
"keyword -.e.;",
"keyword -.e-.;");

for (@numbers) {

if (    /\s* keyword \s+        # stuff before matched number
    ( -?            # optional minus sign
      (?:           # no saving of group in brackets
        (?:\d+\.?\d*)       # match trailing digit and possible floating point number
        |           # or
        (?:\.\d+)       # no trailing digit and forced fpn
      )
    (?:[Ee]-?\d+)?      # optional exponential notation
    )           # end of group to be matched
    ;\s*            # stuff after matched number
    /x) {

print "<<__$_\__>>\n\t $1 \n";
} else { 
print "<<__$_\__>>\n\t !!!!! no matching here !!!!!\n";
}
}

Output:

<<__keyword 152;__>>
     152 
<<__keyword 12.23;__>>
     12.23 
<<__keyword -2.001;__>>
     -2.001 
<<__keyword .123;__>>
     .123 
<<__keyword -12.;__>>
     -12. 
<<__keyword 55.44.33;__>>
     !!!!! no matching here !!!!!
<<__keyword 3e14;__>>
     3e14 
<<__keyword -3.000e0014;__>>
     -3.000e0014 
<<__keyword 5e-04;__>>
     5e-04 
<<__    keyword     5e-04;  __>>
     5e-04 
<<__keyword 5e-04   ;__>>
     !!!!! no matching here !!!!!
<<__keyword .1e2;__>>
     .1e2 
<<__keyword 9.e3;__>>
     9.e3 
<<__keyword -0.01E-03;__>>
     -0.01E-03 
<<__keyword 1.3e-03;__>>
     1.3e-03 
<<__keyword 1dd;__>>
     !!!!! no matching here !!!!!
<<__keyword -12E3e1;__>>
     !!!!! no matching here !!!!!
<<__keyword -.e.;__>>
     !!!!! no matching here !!!!!
<<__keyword -.e-.;__>>
     !!!!! no matching here !!!!!

PS: I have read that the ?: might not save ressources while the code is running and it makes the regex not very eye-friendly, so one might leave this out.

Upvotes: 0

hmatt1
hmatt1

Reputation: 5139

There is another way to do this, and you don't need regular expressions for it. You can use looks_like_number from Scalar::Util

Here's an example: How do I tell if a variable has a numeric value in Perl? I pasted it here for you.


Example:

#!/usr/local/bin/perl

use warnings;
use strict;

use Scalar::Util qw(looks_like_number);

my @exprs = qw(1 5.25 0.001 1.3e8 foo bar 1dd);

foreach my $expr (@exprs) {
    print "$expr is", looks_like_number($expr) ? '' : ' not', " a number\n";
}

Gives this output:

1 is a number
5.25 is a number
0.001 is a number
1.3e8 is a number
foo is not a number
bar is not a number
1dd is not a number

edit: @borodin's comment

You would use it in a way like this:

my $value;
open(FILE,"data.dat") or die "error on opening data: $!\n";
while (my $line = <FILE>) {
        if (($line =~ /^keyword +(.*)/)) {
             my $number = $1;
             if ( looks_like_number($number) ) { 
                 $value = $number;
             }
        };
}

edit: if you have to have a regex, you can an expression like this:

 #!/bin/perl
 use strict;
 use warnings;

 my @numbers = ( 'keyword 13',
                 ' word   25',
                 'keyword -999',
                 'keyword 83.12300',
                 'keyword  .151',
                 'keyword -.213',
                 'keyword 1e14',
                 'keyword 124e2',
                 'keyword -9e-4 ',
                 ' keyword  e43e',
                 'keyword 4.5.6',
                 'keyword 4..e',
                 'keyword NaN',
                 'keyword Inf');

 for (@numbers) {

      if ( /^keyword +(-?((\d+\.?\d*)|(\d*\.?\d+))([Ee]-?\d+)?)/ ) {

         print "$1 is a number\n";

     } else {
         print "$_ does not match keyword or is not a number\n";
     }

 }

Upvotes: 1

xcramps
xcramps

Reputation: 1213

Go to cpan and get Regexp::Common.

Use it like this

use Regexp::Common;

my $re = $RE{num}{real};

if ( $line =~ /^keyword\s+($re)/ ) {
  $value = $1;
}

Much easier than do-it-yourself regular expression rolling.

Upvotes: 2

Dan
Dan

Reputation: 10786

The second regex in your code seems to be redundant, you can safely remove it. The first regex should match all your testcases. Is there anything it doesn't seem to be working with?

You should also tweak your regex, because currently it considers -.e-. to be a number. This comes from having \d*\.\d* which matches .. You could try (?:\d+(?:\.\d*)?|\.\d+) instead of what you have, which would match either 1) digits, 2) digits followed by a decimal and possibly more digits, or 3) a decimal followed by digits.

Upvotes: 1

Related Questions