Reputation: 33
I am trying to create a script in Perl to replace text in all HTML files in a given directory. However, it is not working. Could anyone explain what I'm doing wrong?
my @files = glob "ACM_CCS/*.html";
foreach my $file (@files)
{
open(FILE, $file) || die "File not found";
my @lines = <FILE>;
close(FILE);
my @newlines;
foreach(@lines) {
$_ =~ s/Authors Here/Authors introduced this subject for the first time in this paper./g;
#$_ =~ s/Authors Elsewhere/Authors introduced this subject in a previous paper./g;
#$_ =~ s/D4-/D4: Is the supporting evidence described or cited?/g;
push(@newlines,$_);
}
open(FILE, $file) || die "File not found";
print FILE @newlines;
close(FILE);
}
For example, I'd want to replace "D4-" with "D4: Is the...", etc. Thanks, I'd appreciate any tips.
Upvotes: 2
Views: 240
Reputation: 64909
You are using the two argument version of open
. If $file
does not start with "<", ">", or ">>", it will be opened as read filehandle. You cannot write to a read file handle. To solve this, use the three argument version of open:
open my $in, "<", $file or die "could not open $file: $!";
open my $out, ">", $file or die "could not open $file: $!";
Also note the use of lexical filehandles ($in
) instead of the bareword file handles (FILE
). Lexical filehandles have many benefits over bareword filehandles:
You use them just like you would use a bareword filehandle.
Other things you might want to consider:
$_
)s/foo/bar/;
instead of $_ =~ s/foo/bar/;
)Number 4 may be very important for what you are doing. If you are not certain of the format these HTML files are in, then you could easily miss things. For instance, "Authors Here"
and "Authors\nHere"
means the same thing to HTML, but your regex will miss the later. You might want to take a look at XML::Twig
(I know it says XML, but it handles HTML as well). It is a very easy to use XML/HTML parser.
Upvotes: 3