Sakti
Sakti

Reputation: 61

perl one-liner to split files every given word

Once again making some questions. I have a file of the form:

>seq1
123 234 56
167 332 22
23 456 098
>seq2
123 234 56
167 332 22
23 456 098

I want to have a file saving each >seq#, like this:

File 1:

>seq1
123 234 56
167 332 22
23 456 098

File 2:

>seq2
123 234 56
167 332 22
23 456 098

I could use a perl script but was wondering how this could be done using a perl one-liner, just for the sake of increasing my perl knowledge.

Thanks!!

Upvotes: 2

Views: 471

Answers (2)

TLP
TLP

Reputation: 67900

Looking at Jonathan's answer, I came up with something that is odd enough to post a new answer for. I would like to add that this should be considered an exercise example (perhaps obfuscation), and not in any way proper code. Full credit for the solution goes to Jonathan. Also, this is a dangerous solution, as explained at the bottom.

perl -ple 'open STDOUT, $_' yourfile.txt

This relies on the lines that begin with >seq1 to be used with the old 2-argument open, which Jonathan discovered. E.g. open $fh, ">seq1" will create (overwrite) and open the file seq1 for writing.

At the same time, any line that does not have a valid "mode" symbol -- <, >, | etc. -- will be opened for reading by default, and if we gamble on the fact that no files with the names 123 234 56 etc exist in that directory, we can rely on our open to fail silently and maintain the previously opened STDOUT file handle.

By using the -l option, we do not need to chomp $_ so that the open does not fail, nor do we need to add a newline to the print. At the same time, the -p option will take care of creating the while loop and do the printing.

Because the print by default goes to STDOUT, all we need to do is reopen the STDOUT file handle, and the content of the input file takes care of the rest.

The full code of this one-liner with comments to denote which parts come from which switch:

BEGIN { $/ = "\n"; $\ = "\n"; }    # -l, gives newlines to print
while (<>) {                       # -p 
    chomp $_;                      # -l
    open STDOUT, $_;               # our code
}
continue {
    print STDOUT $_;               # -p
}

Note: This code will release the full power of the open command, which is dangerous as it would, in this case, allow arbitrary commands to be executed on your file system. This is a side effect of allowing the use of 2-argument open.

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 753605

This is a rather minimal script that does the job:

use strict;
use warnings;
my $fh = *STDOUT;

while (<>)
{
    chomp;
    if (m/^>/)
    {
        close $fh;
        open $fh, $_ or die "Failed to open $_";
    }
    print $fh "$_\n";
}

The my $fh = *STDOUT; line means that if there is stuff before the first >file line, it is echoed to standard output.

With that as a basis, you can decide to flatten it to one line, ignoring errors, closing open files, strictures and readability:

perl -e 'while(<>){chomp;open$f,$_ if(m/^>/);print$f "$_\n";}'

I couldn't possibly recommend that, though. (Yes, both the blanks are necessary.)

Upvotes: 2

Related Questions