Reputation: 2221
Hello I´m using an online regex simulator to test that what I´m writing is correct: I want to extract information from the following texts:
Test23242asdsa 800,03 23.05.19 22.05.19
Tdsadas,tsadsa test 1.020,03 23.05.19 22.05.19
Test,23242 0,03 23.05.19 22.05.19
I try to use the same code in perl:
use strict;
my $entry = 'Test23242asdsa 0,03 23.05.19 22.05.19';
my ($name, $expense, $date_expense, $date_paid) = $1, $2, $3, $4 if ($entry =~ m/^(.+)\s((?:\d+\.)?\d{1,3},\d{2})\s(\d{2}\.\d{2}\.\d{2})\s(\d{2}\.\d{2}\.\d{2})$/);
print "Name: '$name', Expense: '$expense', Date: '$date_expense', Date Paid: '$date_paid' \n";
And if I use the same regex here:
https://regex101.com/
^(.+)\s((?:\d+\.)?\d{1,3},\d{2})\s(\d{2}\.\d{2}\.\d{2})\s(\d{2}\.\d{2}\.\d{2})$
It detects de regex correctly. I though python and perl used the same regex syntax, so I don´t get what is going on.
Upvotes: 4
Views: 270
Reputation: 507
While not directly related to your precedence-in-assignment problem, it's worth noting that a statement:
my $x if 0
is called my()
in a false conditional and
is prohibited as of Perl 5.30.0.
So, in general you should avoid doing this kind of "conditional declaration":
my ($foo, $baz) = ($1, $2) if $entry =~ m/^(\w+) bar (\w+)$/
because it might lead to an error when the condition (the regexp match, in this case) is false.
For the specific case, it's much better to rely on brian d foy's advice and assign the result directly to the list of variables you want to populate. The result of the assignment can then be used in boolean context, like in the following example:
if (my ($foo, $baz) = $entry =~ m/^(\w+) bar (\w+)$/) {
# use $foo and $baz for fun and profit...
}
or the following, if you're looping over lines:
while (defined(my $entry = <$fh>)) {
my ($foo, $baz) = $entry =~ m/^(\w+) bar (\w+)$/
or next; # <-- executed only when the regexp match fails
# use $foo and $baz for fun and profit...
}
Upvotes: 1
Reputation: 132832
Although it might not fit into your task, Perl's match operator returns the list of captures in list context. You can then look at the variables to see if they have values.
In this example I used the /x
flag to spread out the regex by making whitespace insignificant and allow for inline comments. It's a bit easier to read that way, and Perl is much more lenient than Python in expanding statements across lines:
#!perl
use strict;
my $entry = '';
my ($name, $expense, $date_expense, $date_paid) =
$entry =~ m/
^
( .+ ) \s # Name
( (?:\d+\.)?\d{1,3},\d{2} ) \s # Expense
( \d{2}\.\d{2}\.\d{2} ) \s # Date
( \d{2}\.\d{2}\.\d{2} ) # Date Paid
$
/x;
if( defined $name ) {
print <<"HERE";
Name: $name
Expense: $expense
Date: $date_expense
Date Paid: $date_paid
HERE
}
else {
print "No match!";
}
But, Perl also supports Python-style named captures. Instead of relying on the capture position in a long regex, you give each capture a name with (?P<label>...)
(although most Perl people will leave off the P
for just (?<label>...)
). The capture names are keys in the %+
hash:
#!perl
use strict;
my $entry = 'Test23242asdsa 800,03 23.05.19 22.05.19';
$entry =~ m/
^
(?P<name> .+ ) \s
(?P<expense> (?:\d+\.)?\d{1,3},\d{2} ) \s
(?P<date> \d{2}\.\d{2}\.\d{2} ) \s
(?P<date_paid> \d{2}\.\d{2}\.\d{2} )
$
/x;
if( defined $+{name} ) {
print <<"HERE";
Name: $+{name}
Expense: $+{expense}
Date: $+{date}
Date Paid: $+{date_paid}
HERE
}
else {
print "No match!";
}
Upvotes: 3
Reputation: 781255
The regexp is fine, the problem is with how you're setting the variables.
You need to wrap $1, $2, $3, $4
in parentheses to do list assignment, because of Perl's operator precedence.
Change it to
my ($name, $expense, $date_expense, $date_paid) = ($1, $2, $3, $4) if ($entry =~ m/^(.+)\s((?:\d+\.)?\d{1,3},\d{2})\s(\d{2}\.\d{2}\.\d{2})\s(\d{2}\.\d{2}\.\d{2})$/);
Upvotes: 6
Reputation: 6798
Why not to try something of this kind?
use strict;
use warnings;
use feature 'say';
my $re = qr/(\D+|\S+) ([\d,\.]+) ([\d\.]+) ([\d\.]+)/;
my @data;
while(<DATA>) {
next unless /$re/;
push @data, { name => $1, amount => $2, date_exp => $3, date_paid => $4 };
}
printf " %-20s | %10s | %-8s | %-8s |\n", qw/Name Amount Expense Paid/;
print '-' x 58 . "\n";
for my $rec (@data) {
printf " %-20s | %10s | %8s | %8s |\n", @$rec{qw/name amount date_exp date_paid/} ;
}
__DATA__
Test23242asdsa 800,03 23.05.19 22.05.19
Tdsadas,tsadsa test 1.020,03 23.05.19 22.05.19
Test,23242 0,03 23.05.19 22.05.19
Output
Name | Amount | Expense | Paid |
----------------------------------------------------------
Test23242asdsa | 800,03 | 23.05.19 | 22.05.19 |
Tdsadas,tsadsa test | 1.020,03 | 23.05.19 | 22.05.19 |
Test,23242 | 0,03 | 23.05.19 | 22.05.19 |
Upvotes: -1