Reputation: 25
Can any body tell me a regular expression in Perl for getting the ucfirst letter of the word comming after a dot,question or exclamation sign...
My program reads string character by character.
Requirement :
input string : "abcd[.?!]\s*abcd"
output: "Abcd[.?!]\s*Abcd"
My program is as follows:
#!/usr/bin/perl
use strict;
my $str = <STDIN>;
my $len=length($str);
my $ch;
my $i;
for($i=0;$i<=length($str);$i++)
{
$ch = substr($str,$i,1);
print "$ch";
if($ch =~ 's/([.?!]\s*[a-z])/uc($1)/ge')
{
$i=$i+1;
$ch = substr($str, $i,1);
my $ch = uc($ch);
print "$ch";
}
#elsif($ch eq "?")
#{
# $i=$i+1;
# $ch = substr($str, $i,1);
# my $ch = uc($ch);
# print "$ch";
#}
#elsif($ch eq "!")
#{
# $i=$i+1;
# $ch = substr($str, $i,1);
# my $ch = uc($ch);
# print"$ch";
#}
#elsif($ch eq " ")
#{
# $i=$i+1;
# $ch = substr($str, $i,1);
# my $ch = uc($ch);
# print"$ch";
#}
#else
#{
#print "";
#}
}
print "\n";
Upvotes: 1
Views: 1482
Reputation: 91430
If you have unicode string, you could use:
$str =~ s/(\pP|^)(\s*\pL)/$1\U$2/g;
Upvotes: 0
Reputation: 2736
can any body tell me the regex in perl for getting the ucfirst letter of the word comming after a dot,question or exclamation sign...
My program reads string character by character.
Requirement :
input string : "abcd[.?!]\s*abcd"
output: "Abcd[.?!]\s*Abcd"
Your output does not match your explanation. In the input, the initial "a" does not follow a period, question mark, or exclamation mark, but was changed to upper case.
You can and should do this sort of processing with a single substitution. To do exactly as you said:
s/[.?!]\K[[:lower:]]/uc($&)/ge
The \K
discards the character matched by [.?!], leaving only the lower-case letter in the matched string. $&
is the matched string. The e
flag says to evaluate uc($&)
.
If you also want to make an initial letter uppercase:
s/(?:^|[.?!])\K[[:lower:]]/uc($&)/ge
Upvotes: 0
Reputation: 385897
Normally,
$s =~ s/(?<=[.?!]|^)\s*[a-z]/\U$1/g;
$s =~ s/(?<![^.?!])\s*[a-z]/\U$1/g;
$s =~ s/(?:^|[.?!])\s*\K[a-z]/\U$1/g;
But if you only read one character at a time,
my $after_punc = 1;
while (my $ch = ...) {
if ($ch =~ /^[.?!]\z/) {
$after_punc = 1;
}
elsif ($ch =~ /^[a-z]\z/) {
$ch = uc($ch) if $after_punc;
$after_punc = 0;
}
elsif ($ch =~ /^\s\z/) {
# Ignore whitespace.
}
else {
$after_punc = 0;
}
...
}
Upvotes: 0
Reputation: 189477
Looping over the string, and then looping over the match, is completely redundant. Your entire program can be replaced with this:
perl -pe 's/(^|[.?!]\s*)([a-z])/$1\U\2/g' inputfile >outputfile
I added beginning of line to the first parenthesized expression, although your explanation doesn't include that (but your example does).
Upvotes: 1