Reputation: 122092
In Perl, there's the ucfirst function.
Is it this the equivalent to this:
sub uppercase {
my ($W) = @_;
$$W = uc(substr($$W,0,1)).substr($$W,1);
}
Does it matter across Perl version?
Contextualizing the question, https://github.com/moses-smt/mosesdecoder/pull/206/files#diff-876e51db2a1ab71c1ae736182d1e5e04R63 ,
Previously, the usage of uppercase
is as such:
sub process {
my $line = $_[0];
chomp($line);
$line =~ s/^\s+//;
$line =~ s/\s+$//;
my @WORD = split(/\s+/,$line);
# uppercase at sentence start
my $sentence_start = 1;
for(my $i=0;$i<scalar(@WORD);$i++) {
&uppercase(\$WORD[$i]) if $sentence_start;
if (defined($SENTENCE_END{ $WORD[$i] })) { $sentence_start = 1; }
elsif (!defined($DELAYED_SENTENCE_START{$WORD[$i] })) { $sentence_start = 0; }
}
# uppercase headlines {
if (defined($SRC) && $HEADLINE[$sentence]) {
foreach (@WORD) {
&uppercase(\$_) unless $ALWAYS_LOWER{$_};
}
}
But it seems like replacing &uppercase(\$WORD[$i])
and &uppercase(\$_)
with ucfirst(\$WORD[$i])
and ucfirst(\$_)
is different.
Upvotes: 1
Views: 581
Reputation: 69274
In Perl, there's the ucfirst function.
Is it this the equivalent to this:
Let's find out...
$ cat testuc
use strict;
use warnings;
use Test::More;
sub uppercase {
my ($w) = @_;
return uc(substr($w, 0, 1)) . substr($w, 1);
}
my @tests = qw[foobar Foobar FOOBar fOObar fOObAR FOOBAR];
for (@tests) {
is(ucfirst($_), uppercase($_), "correct for $_");
}
done_testing;
$ prove -v testuc
testuc ..
ok 1 - correct for foobar
ok 2 - correct for Foobar
ok 3 - correct for FOOBar
ok 4 - correct for fOObar
ok 5 - correct for fOObAR
ok 6 - correct for FOOBAR
1..6
ok
All tests successful.
Files=1, Tests=6, 0 wallclock secs ( 0.04 usr 0.03 sys + 0.03 cusr 0.04 csys = 0.14 CPU)
Result: PASS
So, yes, it looks like they're the same thing (at least for my rather limited set of tests).
I'm using Perl 5.26.1 - but I think this will work fine for all Perl versions back to at least 5.10.
Update:
I made a silent edit to your code which I forgot to mention. You code originally worked on a reference to a scalar, but I changed it to work on a scalar ($W
instead of $$W
). I assumed that would be a harmless substitution.
But now you've shown us your change in context and I can see what's going on.
You had:
&uppercase(\$WORD[$i])
And you changed that to:
ucfirst(\$WORD[$i])
This doesn't work as ucfirst()
doesn't change its argument; it returns the changed value. So you actually want:
$WORD[$i] = ucfirst($WORD[$i]);
That will then work as expected (modulo the Unicode character issues mentioned in other answers.
Your whole loop can be simplified if you move away from the C-style for
loop.
for my $w (@WORD) {
$w = ucfirst($w) if $sentence_start;
if (defined $SENTENCE_END{ $w }) {
$sentence_start = 1;
} elsif (!defined $DELAYED_SENTENCE_START{ $w }) {
$sentence_start = 0;
}
}
Upvotes: 2
Reputation: 385897
ucfirst
is not equivalent to the following:
sub uppercase {
my ($W) = @_;
$$W = uc(substr($$W,0,1)).substr($$W,1);
}
ucfirst
is mostly[1] equivalent to the following:
sub ucfirst {
my ($W) = @_;
return uc(substr($W,0,1)).substr($W,1);
}
If you wanted to rewrite uppercase
in terms of ucfirst
, it would look like this:
sub uppercase {
my ($W) = @_;
$$W = ucfirst($$W);
}
uppercase(\$string);
That means that if you wanted to eliminate uppercase
entirely, you'd replace
uppercase(\$string);
with
$string = ucfirst($string); # Correct
You tried using
ucfirst(\$string); # Wrong
ucfirst
actually does a better job of handling more esoteric characters such as U+01F3 LATIN SMALL LETTER DZ ("dz").Upvotes: 2
Reputation: 72356
The functions are not equivalent because of some Unicode details, especially dealing with digraphs.
For example, the Hungarian language uses the digraph "DZ", which is considered a single letter of the alphabet, and so can optionally be represented using the Unicode code points:
U+01F1
: DZU+01F2
: DzU+01F3
: dzSo
my $text1 = "\x{1f3}won";
my $text2 = $text1;
$text1 = ucfirst($text1);
uppercase(\$text2);
print($text1 eq $text2 ? "same\n" : "different\n");
prints "different".
Upvotes: 2