Umesh Chandra Kahali
Umesh Chandra Kahali

Reputation: 71

Search and Replace using Perl

I have some tags with values like below,

<section>
<title id="ABC0123">is The human nervous system?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="DEF0123">Terms for anatomical directions in the nervous system</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="ABC4356">Anatomical terms: is referring to directions</title>
.
.
.

The output I need is like below,

<section>
<title id="ABC0123">Is the Human Nervous System?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="DEF0123">Terms for Anatomical Directions in the Nervous System</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="ABC4356">Anatomical Terms: Is Referring to Directions</title>
.
.

how could I do this using perl. Here all prepositions and articles will be in lower case. Now the condition is slightly differs as below

condition is if a word that is in @lowercase (suppose is) and it is the first word of the and is in lower case then it should be upper case. Again if any @lowercase word after colon in the should be in upper case.

Upvotes: 0

Views: 140

Answers (2)

jimtut
jimtut

Reputation: 2393

New answer to match the updated question (sample input and desired output changed since the original question). Updated again on Mar 9, 2014, per the op's request to always uppercase the first word in a title tag.

#!/usr/bin/perl

use strict;
use warnings;

# Add your articles and prepositions here!!!
my @lowercase = qw(a an at for in is the to);

# Use a hash since lookup is easier later.
my %lowercase;
# Populate the hash with keys and values from @lowercase.
# Values could have been anything, but it needs to match the number of keys, so this is easiest.
@lowercase{@lowercase} = @lowercase;

open(F, "foo.txt") or die $!;
while(<F>) {
  if (m/^<title/i) {
    chomp;
    my @words;
    my $line = $_;
    # Save the opening <title> tags
    my $titleTag = $line;
    $titleTag =~ s/^(<[^>]*>).*/$1/;
    # Remove any tags in <brackets>
    $line =~ s/<[^>]*>//g;
    # Uppercase the first letter in every word, except for those in a certain list.
    my $first = 1;
    foreach my $word (split(/\s/, $line)) {
      if ($first) {
        $first = 0;
        push(@words, ucfirst($word));
        next;
      }
      if ($first || exists $lowercase{$word}) { push(@words, "$word") }
      else { push(@words, ucfirst($word)) }
    }
    print $titleTag . join(" ", @words) . "</title>\n";
  }
  else {
    print $_;
  }
}
close(F)

This code does make 2 assumptions:

  1. Each <title>...</title> is on a single line. It never wraps to more than one line in the file.
  2. The opening <title> tag is at the beginning of the line. This can be easily be changed in the code if desired though.

Upvotes: 0

Neil Lunn
Neil Lunn

Reputation: 151190

Probably something like this then:

#!/usr/bin/env perl
use strict;
use warnings;

my $lines = qq#
<title>The human nervous system</title>
<title>Terms for anatomical directions in the nervous system</title>
<title>Anatomical terms referring to directions</title>
#;

foreach my $line ( split(/\n/, $lines ) ) {

    $line =~ s|</?title>||g;

    if ( $line = /\w+/ ) {                # Skip if blank
        print "<title>" . ucfirst(
           join(" ",
               map{ !/^(in|the|on|or|to|for)$/i ? ucfirst($_) : lc($_); }
               split(/\s/, $line )
           )
        ) ."<\/title>\n";

    }
}

Or however you want to loop your file. But you are going to have to filter the terms you don't want converted like this. As I have shown.

Upvotes: 2

Related Questions