What changes in the loop needed to get the output?

Question

I have files that has header format starts with > character. Say if the header is in this format: '>anything1|anything2', I use this script to trim header and to get output header '>anything1'.

while (<>) {
    if (/^(>[^|]*)/) {
    print "$1
";
    } else {
    print;
    }
}

But now, in my files some header are large like this:

>anything1|anything2|anything3 bla bla bla /#

and some headers are like:

>anything1

Now from this mixed header type in a single file, if I want a output that trim headers up to 2 character for larger headers (that is, '>anything1|anything2' for the above large header) and keep one character for the small headers (i.e., '>anything1' only for the above small header), what changes should I have to do in my loop?

Thanks

zdim · Accepted Answer

How about getting out of that regex

while (<>) 
{
    if (/^>/) 
    {
        my @fields = split '\|', $_;

        if (@fields <= 2) {  print $fields[0] }
        else              {  print join '|', @fields[0,1] }

        next;
    }

    print;
}

Please consider possible edge cases. It's easy when you have an array.

With regex, one can match cases separately, or carefully come up with one that somehow bundles those two to three different scenarios, which will be far more involved.

What changes in the loop needed to get the output?

Answers (1)

Related Questions