Reputation: 71
I have some tags with values like below,
<section>
<title id="ABC0123">is The human nervous system?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="DEF0123">Terms for anatomical directions in the nervous system</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="ABC4356">Anatomical terms: is referring to directions</title>
.
.
.
The output I need is like below,
<section>
<title id="ABC0123">Is the Human Nervous System?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="DEF0123">Terms for Anatomical Directions in the Nervous System</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="ABC4356">Anatomical Terms: Is Referring to Directions</title>
.
.
how could I do this using perl. Here all prepositions and articles will be in lower case. Now the condition is slightly differs as below
condition is if a word that is in @lowercase (suppose is) and it is the first word of the and is in lower case then it should be upper case. Again if any @lowercase word after colon in the should be in upper case.
Upvotes: 0
Views: 140
Reputation: 2393
New answer to match the updated question (sample input and desired output changed since the original question). Updated again on Mar 9, 2014, per the op's request to always uppercase the first word in a title tag.
#!/usr/bin/perl
use strict;
use warnings;
# Add your articles and prepositions here!!!
my @lowercase = qw(a an at for in is the to);
# Use a hash since lookup is easier later.
my %lowercase;
# Populate the hash with keys and values from @lowercase.
# Values could have been anything, but it needs to match the number of keys, so this is easiest.
@lowercase{@lowercase} = @lowercase;
open(F, "foo.txt") or die $!;
while(<F>) {
if (m/^<title/i) {
chomp;
my @words;
my $line = $_;
# Save the opening <title> tags
my $titleTag = $line;
$titleTag =~ s/^(<[^>]*>).*/$1/;
# Remove any tags in <brackets>
$line =~ s/<[^>]*>//g;
# Uppercase the first letter in every word, except for those in a certain list.
my $first = 1;
foreach my $word (split(/\s/, $line)) {
if ($first) {
$first = 0;
push(@words, ucfirst($word));
next;
}
if ($first || exists $lowercase{$word}) { push(@words, "$word") }
else { push(@words, ucfirst($word)) }
}
print $titleTag . join(" ", @words) . "</title>\n";
}
else {
print $_;
}
}
close(F)
This code does make 2 assumptions:
<title>...</title>
is on a single line. It never wraps to more
than one line in the file.<title>
tag is at the beginning of the line. This can be easily be changed in the code if desired though.Upvotes: 0
Reputation: 151190
Probably something like this then:
#!/usr/bin/env perl
use strict;
use warnings;
my $lines = qq#
<title>The human nervous system</title>
<title>Terms for anatomical directions in the nervous system</title>
<title>Anatomical terms referring to directions</title>
#;
foreach my $line ( split(/\n/, $lines ) ) {
$line =~ s|</?title>||g;
if ( $line = /\w+/ ) { # Skip if blank
print "<title>" . ucfirst(
join(" ",
map{ !/^(in|the|on|or|to|for)$/i ? ucfirst($_) : lc($_); }
split(/\s/, $line )
)
) ."<\/title>\n";
}
}
Or however you want to loop your file. But you are going to have to filter the terms you don't want converted like this. As I have shown.
Upvotes: 2