BeanBagKing
BeanBagKing

Reputation: 2093

Removing lines containing a string from a file w/ perl

I'm trying to take a file INPUT and, if a line in that file contains a string, replace the line with something else (the entire line, including line breaks), or nothing at all (remove the line like it wasn't there). Writing all this to a new file .

Here's that section of code...

while(<INPUT>){
    if ($_ =~ /  <openTag>/){
        chomp;
        print OUTPUT "Some_Replacement_String";
    } elsif ($_ =~ /  <\/closeTag>/) {
        chomp;
        print OUTPUT ""; #remove the line
    } else {
        chomp;
        print OUTPUT "$_\r\n"; #print the original line
    }
}

while(<INPUT>) should read one line at a time (if my understanding is correct) and store each line in the special variable $_

However, when I run the above code I get only the very first if statement condition returned Some_Replacement_String, and only once. (1 line, out of a file with 1.3m, and expecting 600,000 replacements). This obviously isn't the behavior I expect. If I do something like while(<INPUT>){print OUTPUT $_;) I get a copy of the entire file, every line, so I know the entire file is being read (expected behavior).

What I'm trying to do is get a line, test it, do something with it, and move on to the next one.

If it helps with troubleshooting at all, if I use print $.; anywhere in that while statement (or after it), I get 1 returned. I expected this to be the "Current line number for the last filehandle accessed.". So by the time my while statement loops through the entire file, it should be equal to the number of lines in the file, not 1.

I've tried a few other variations of this code, but I think this is the closest I've come. I assume there's a good reason I'm not getting the behavior I expect, can anyone tell me what it is?

Upvotes: 0

Views: 2226

Answers (1)

TLP
TLP

Reputation: 67900

The problem you are describing indicates that your input file only contains one line. This may be because of a great many different things, such as:

  • You have changed the input record separator $/
  • Your input file does not contain the correct line endings
  • You are running your script with -0777 switch

Some notes on your code:

if ($_ =~ /  <openTag>/){
    chomp;
    print OUTPUT "Some_Replacement_String";

No need to chomp a line you are not using.

} elsif ($_ =~ /  <\/closeTag>/) {
    chomp;
    print OUTPUT "";

This is quite redundant. You don't need to print an empty string (ever, really), and chomp a value you're not using.

} else {
    chomp;
    print OUTPUT "$_\r\n"; #print the original line

No need to remove newlines and then put them back. Also, normally you would use \n as your line ending, even on windows.

And, since you are chomping in every if-else clause, you might as well move that outside the entire if-block.

chomp;
if (....) {

But since you are never relying on line endings not being there, why bother using chomp at all?

When using the $_ variable, you can abbreviate some commands, such as you are doing with chomp. For example, a lone regex will be applied to $_:

} elsif (/  <\/closeTag>/) {  # works splendidly

When, like above, you have a regex that contains slashes, you can choose another delimiter for your regex, so that you do not need to escape the slashes:

} elsif (m#  </closeTag>#) {

But then you need to use the full notation of the m// operator, with the m in front.

So, in short

while(<INPUT>){
    if (/  <openTag>/){
        print OUTPUT "Some_Replacement_String";
    } elsif (m#  </closeTag>#) {
        # do nothing
    } else {
        print OUTPUT $_;   # print the original line
    }
}

And of course, the last two can be combined into one, with some negation logic:

} elsif (not m#  </closeTag>#) {
    print OUTPUT $_;
}

Upvotes: 4

Related Questions