Reputation: 89
I want to substitute decimals from commas to fullstops in a file and I wanted to try to do this in perl. An example of my dataset looks something like this:
Species_1:0,12, Species_2:0,23, Species_3:2,53
I want to substitute the decimals but not all commas such that:
Species_1:0.12, Species_2:0.23, Species_3:2.53
I was thinking it might work using the substitution function like such:
$comma_file= "Species_1:0,12 , Species_2:0,23, Species_3:2,53"
$comma = "(:\d+/,\d)";
#match a colon, any digits after the colon, the wanted comma and digits preceding it
if ($comma_file =~ m/$comma/g) {
$comma_file =~ tr/,/./;
}
print "$comma_file\n";
However, when I tried this, what happened was that all my commas changed into fullstops, not just the ones I was targetting. Is it an issue with the regex or am I just not doing the match substitution correctly?
Thanks!
Upvotes: 1
Views: 213
Reputation: 66883
From the shown data it appears that a comma to be replaced must always have a number on each side, and that every such occurrence need be replaced. There is a fine answer by GMB.
Another way for this kind of a problem is to use lookarounds
$comma_file =~ s/(?<=[0-9]),(?=[0-9])/./g;
which should be more efficient, as there is no copying into $1
and $2
and no quantifiers.
My benchmark
use warnings;
use strict;
use feature 'say';
use Benchmark qw(cmpthese);
my $str = q(Species_1:0,12, Species_2:0,23, Species_3:2,53);
sub subs {
my ($str) = @_;
$str =~ s/(\d+),(\d+)/$1.$2/g;
return $str;
}
sub look {
my ($str) = @_;
$str =~ s/(?<=\d),(?=\d)/./g;
return $str;
}
die "Output not equal" if subs($str) ne look($str);
cmpthese(-3, {
subs => sub { my $res = subs($str) },
look => sub { my $res = look($str) },
});
with output
Rate subs look subs 256126/s -- -46% look 472677/s 85% --
This is only one, particular, string but the efficiency advantage should only increase with the length of the string, while longer patterns (numbers here) should reduce that a little.
Upvotes: 2
Reputation: 222462
This :
use strict;
use warnings;
my $comma_file = "Species_1:0,12, Species_2:0,23, Species_3:2,53";
$comma_file =~ s/(\d+),(\d+)/$1.$2/g;
print $comma_file, "\n";
Yields :
Species_1:0.12, Species_2:0.23, Species_3:2.53
The regex searches for commas having at least one digit on both sides and replaces them with a dot.
Your code doesn’t work because you first check for commas surrounded by digits, and, if ok, you then replace ALL commas with dots
Upvotes: 3