Somesh Pokarne
Somesh Pokarne

Reputation: 27

Copying content from one file to another file using perl

Following code is for copying file content from readfile to writefile. Instead of copying upto last, i want to copy upto some keyword.

use strict;
use warnings;

use File::Slurp;

my @lines = read_file('readfile.txt');

while ( my $line = shift @lines) {
  next unless ($line =~ m/END OF HEADER/);
  last; # here suggest some other logic 
}

append_file('writefile.txt', @lines);

Upvotes: 2

Views: 6933

Answers (2)

7stud
7stud

Reputation: 48599

data.txt:

fsffs
sfsfsf 
sfSDFF
END OF HEADER 
{ dsgs xdgfxdg zFZ } 
dgdbg 
vfraeer 

Code:

use strict;
use warnings; 
use 5.020;
use autodie;
use Data::Dumper;

my $infile = 'data.txt';
my $header_file = 'header.txt';
my $after_header_file = 'after_header.txt';
open my $DATA, '<', $infile;
open my $HEADER, '>', $header_file;
open my $AFTER_HEADER, '>', $after_header_file;

{
    local $/ =  "END OF HEADER";

    my $header = <$DATA>; 
    say {$HEADER} $header;

    my $rest = <$DATA>;
    say {$AFTER_HEADER} $rest;
}


close $DATA;
close $HEADER;
close $AFTER_HEADER;

say "Created files: $header_file, $after_header_file";

Output:

$ perl 1.pl 
Created files: header.txt, after_header.txt

$ cat header.txt 
fsffs
sfsfsf 
sfSDFF
END OF HEADER

$ cat after_header.txt 

{ dsgs xdgfxdg zFZ } 
dgdbg 
vfraeer 

$/ specifies the input record separator, which by default is a newline. Therefore, when you read from a file:

while (my $x = <$INFILE>) {

}

each value of $x is a sequence of characters up to and including the input recored separator, i.e. a newline, which is what we normally think of as a line of text in a file. Often, we chomp off the newline/input_record_separator at the end of the text:

while (my $x = <$INFILE>) {
    chomp $x;
    say "$x is a dog";
}

But, you can set the input record separator to anything you want, like your "END OF HEADER" text. That means a line will be all the text up to and including the input record separator, which in this case is "END OF HEADER". For example, a line will be: "abc\ndef\nghi\nEND OF HEADER". Furthermore, chomp() will now remove "END OF HEADER" from the end of its argument, so you could chomp your line if you don't want the "END OF HEADER" marker in the output file.

If perl cannot find the input record separator, then perl keeps reading the file until perl hits the end of the file, then perl returns all the text that was read.

You can use those operations to your advantage when you want to seek to some specific text in a file.

Declaring a variable as local makes the variable magical: when the closing brace of the surrounding block is encountered, perl sets the variable back to the value it had just before the opening brace of the surrounding block:

#Here, by default $/ = "\n", but some code out here could have
#also set $/ to something else

{
    local $/ =  "END OF HEADER";


} # $/ gets set back to whatever value it had before this block

When you change one of perl's predefined global variables, it's considered good practice to only change the variable for as long as you need to use the variable, then change the variable back to what it was.

If you want to target just the text between the braces, you can do:

data.txt:

fsffs
sfsfsf 
sfSDFF
END OF HEADER { dsgs xdgfxdg zFZ } 
dgdbg 
vfraeer 

Code snippet:

    ...
    ...
    {
        local $/ = 'END OF HEADER {';
        my $pre_brace = <$DATA>; 

        $/ = '}';
        my $target_text = <$DATA>;
        chomp $target_text;  #Removes closing brace
        say "->$target_text<-";
    }

--output:--
-> dsgs xdgfxdg zFZ <-

Upvotes: 1

haukex
haukex

Reputation: 3013

next will continue to the next iteration of the loop, effectively skipping the rest of the statements in the loop for that iteration (in this case, the last).

last will immediately exit the loop, which sounds like what you want. So you should be able to simply put the conditional statement on the last.

Also, I'm not sure why you want to read the entire file into memory to iterate over its lines? Why not just use a regular while(<>)? And I would recommend avoiding File::Slurp, it has some long-standing issues.

You don't show any example input with expected output, and your description is unclear - you said "i want to copy upto some keyword" but in your code you use shift, which removes items from the beginning of the array.

Do you want to remove the lines before or after and including or not including "END OF HEADER"?

This code will copy over only the header:

use warnings;
use strict;

my $infile  = 'readfile.txt';
my $outfile = 'writefile.txt';

open my $ifh, '<', $infile  or die "$infile: $!";
open my $ofh, '>', $outfile or die "$outfile: $!";
while (<$ifh>) {
    last if /END OF HEADER/;
    print $ofh $_;
}
close $ifh;
close $ofh;

Whereas if you want to copy everything after the header, you could replace the while above with:

while (<$ifh>) {
    last if /END OF HEADER/;
}
while (<$ifh>) {
    print $ofh $_;
}

Which will loop and do nothing until it sees END OF HEADER, then breaking out of the first loop and moving to the second, which prints out the lines after the header.

Upvotes: 7

Related Questions