Reputation: 1197
I asked in another topic about matching numbers like 123. This was too narrow and as I get deeper into Regex I see that you really have to define anything. So I asked for exponential notation and got an answer in this post: /^keyword\s+(-?(?:\d+|\d*\.\d*)(?:[Ee]-?(?:\d+|\d*\.\d*))?)/
. I tried to understand this but failed so far.
So I ask more specific now. I need to match numbers, I give some examples here:
13
-999
83.12300
.151
-.213
1e14
124e2
-9e-4
You got it, the regular math stuff.
And to be even more specific I give you my Perl code for this. I am searching for keyword
on a line and need to get a value from this line. I'd like to get this value in one Regex because my workaround with the or-statement ||
seems to cause problems.
my $value;
open(FILE,"data.dat") or die "error on opening data: $!\n";
while (my $line = <FILE>) {
if (($line =~ /^keyword\s+(-?(?:\d+|\d*\.\d*)(?:[Ee]-?(?:\d+|\d*\.\d*))?)/x) || ($line =~ /^keyword\s*(\d*\.\d*)/)) {
$value = $1;
};
}
close(FILE);
Edit
Thx to all for the hints so far.
Upvotes: 1
Views: 711
Reputation: 1197
Thank to your informative postings and the stuff I read the last days, I was able to understand more of the regex-structure. So for this rather simple task I don't want to use additional modules/packages and want to stick to regex. I made some tests and changes to leave out redundancy and adjust to my task. So I won't have several numbers on one line and there can be whitespace on the line. Also, the end of a number is defined by a semicolon. To summarize, I post my final code. Thanks all for the help.
#!/usr/bin/perl
use strict;
use warnings;
my @numbers=(
"keyword 152;",
"keyword 12.23;",
"keyword -2.001;",
"keyword .123;",
"keyword -12.;",
"keyword 55.44.33;",
"keyword 3e14;",
"keyword -3.000e0014;",
"keyword 5e-04;",
" keyword 5e-04; ",
"keyword 5e-04 ;",
"keyword .1e2;",
"keyword 9.e3;",
"keyword -0.01E-03;",
"keyword 1.3e-03;",
"keyword 1dd;",
"keyword -12E3e1;",
"keyword -.e.;",
"keyword -.e-.;");
for (@numbers) {
if ( /\s* keyword \s+ # stuff before matched number
( -? # optional minus sign
(?: # no saving of group in brackets
(?:\d+\.?\d*) # match trailing digit and possible floating point number
| # or
(?:\.\d+) # no trailing digit and forced fpn
)
(?:[Ee]-?\d+)? # optional exponential notation
) # end of group to be matched
;\s* # stuff after matched number
/x) {
print "<<__$_\__>>\n\t $1 \n";
} else {
print "<<__$_\__>>\n\t !!!!! no matching here !!!!!\n";
}
}
Output:
<<__keyword 152;__>>
152
<<__keyword 12.23;__>>
12.23
<<__keyword -2.001;__>>
-2.001
<<__keyword .123;__>>
.123
<<__keyword -12.;__>>
-12.
<<__keyword 55.44.33;__>>
!!!!! no matching here !!!!!
<<__keyword 3e14;__>>
3e14
<<__keyword -3.000e0014;__>>
-3.000e0014
<<__keyword 5e-04;__>>
5e-04
<<__ keyword 5e-04; __>>
5e-04
<<__keyword 5e-04 ;__>>
!!!!! no matching here !!!!!
<<__keyword .1e2;__>>
.1e2
<<__keyword 9.e3;__>>
9.e3
<<__keyword -0.01E-03;__>>
-0.01E-03
<<__keyword 1.3e-03;__>>
1.3e-03
<<__keyword 1dd;__>>
!!!!! no matching here !!!!!
<<__keyword -12E3e1;__>>
!!!!! no matching here !!!!!
<<__keyword -.e.;__>>
!!!!! no matching here !!!!!
<<__keyword -.e-.;__>>
!!!!! no matching here !!!!!
PS: I have read that the ?:
might not save ressources while the code is running and it makes the regex not very eye-friendly, so one might leave this out.
Upvotes: 0
Reputation: 5139
There is another way to do this, and you don't need regular expressions for it. You can use looks_like_number
from Scalar::Util
Here's an example: How do I tell if a variable has a numeric value in Perl? I pasted it here for you.
Example:
#!/usr/local/bin/perl
use warnings;
use strict;
use Scalar::Util qw(looks_like_number);
my @exprs = qw(1 5.25 0.001 1.3e8 foo bar 1dd);
foreach my $expr (@exprs) {
print "$expr is", looks_like_number($expr) ? '' : ' not', " a number\n";
}
Gives this output:
1 is a number
5.25 is a number
0.001 is a number
1.3e8 is a number
foo is not a number
bar is not a number
1dd is not a number
edit: @borodin's comment
You would use it in a way like this:
my $value;
open(FILE,"data.dat") or die "error on opening data: $!\n";
while (my $line = <FILE>) {
if (($line =~ /^keyword +(.*)/)) {
my $number = $1;
if ( looks_like_number($number) ) {
$value = $number;
}
};
}
edit: if you have to have a regex, you can an expression like this:
#!/bin/perl
use strict;
use warnings;
my @numbers = ( 'keyword 13',
' word 25',
'keyword -999',
'keyword 83.12300',
'keyword .151',
'keyword -.213',
'keyword 1e14',
'keyword 124e2',
'keyword -9e-4 ',
' keyword e43e',
'keyword 4.5.6',
'keyword 4..e',
'keyword NaN',
'keyword Inf');
for (@numbers) {
if ( /^keyword +(-?((\d+\.?\d*)|(\d*\.?\d+))([Ee]-?\d+)?)/ ) {
print "$1 is a number\n";
} else {
print "$_ does not match keyword or is not a number\n";
}
}
Upvotes: 1
Reputation: 1213
Go to cpan and get Regexp::Common
.
Use it like this
use Regexp::Common;
my $re = $RE{num}{real};
if ( $line =~ /^keyword\s+($re)/ ) {
$value = $1;
}
Much easier than do-it-yourself regular expression rolling.
Upvotes: 2
Reputation: 10786
The second regex in your code seems to be redundant, you can safely remove it. The first regex should match all your testcases. Is there anything it doesn't seem to be working with?
You should also tweak your regex, because currently it considers -.e-.
to be a number. This comes from having \d*\.\d*
which matches .
. You could try (?:\d+(?:\.\d*)?|\.\d+)
instead of what you have, which would match either 1) digits, 2) digits followed by a decimal and possibly more digits, or 3) a decimal followed by digits.
Upvotes: 1