user2814482
user2814482

Reputation: 651

how to skip to a specif line and parse after that - perl

I have a file with some writing in the first few lines, then some tabular output. I want to pares the first line and then skip to the tabular output, but am having some trouble (even though it sounds simple). My strategy is to find the header

example input file:

Query         [VOG0001]|NC_002014-NP_040572.1| 1296..1562 + 88 aa|G V protein
Match_columns 100
No_of_seqs    7 out of 16
Neff          2.6 

No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
1 d1gvpa_ b.40.4.7 (A:) Gene V p 100.0 1.6E-38 1.4E-43  221.5   0.0   87    2-89      1-87  (87)
2 d1gvpa_ b.40.4.7 (A:) Gene V p 100.0 1.6E-38 1.4E-43  221.5   0.0   87    2-89      1-87  (87)
3 d1gvpa_ b.40.4.7 (A:) Gene V p 100.0 1.6E-38 1.4E-43  221.5   0.0   87    2-89      1-87  (87)

attempted parsing script:

open (IN, $hhr_report) or die "cannot open $hhr_report\n";
while (my $line=<IN>){
    if ($line =~/^Query/){
            my @query=split(/\|/,$line);
            my $vogL=$query[0];
            my @vogL2=split(/\s+/,$vogL);
            $vog=$vogL2[1];
            $vog=~ s/\[//g;
            $vog=~ s/\]//g;
    print "query_array:\t@query\n";
    print "query_vog:\t$vog\n";
    }
    next until ($line =~/Query HMM/);
    #next if ($line =~/Query HMM/);
    #next until ($line =~/^No\s[0-9]+/);
    print "$line\n";
    my @columns = split(/\s+/,$line);

... }

I"m not sure if I am missing something simple. But right now I only seem to be parsing the header line (containgin Query HMM), but I want to parse the lines After that.

any help appreciated.

Upvotes: 0

Views: 238

Answers (2)

cbmckay
cbmckay

Reputation: 496

I think what you are trying to accomplish can be done more simply. I understand you want to:

  1. Get the first line of the file and process it
  2. Skip the next lines until the table
  3. Process the tabular data

If so, you could do something like this:

open (IN, $hhr_report) or die "cannot open $hhr_report\n";

# Get the first line of the file and process it:
my $first_line = <$fh>;
my @query=split(/\|/,$first_line);
my $vogL=$query[0];
my @vogL2=split(/\s+/,$vogL);
my $vog=$vogL2[1];
$vog=~ s/\[//g; #/
$vog=~ s/\]//g; #/
print "query_array:\t@query\n";
print "query_vog:\t$vog\n";

# Work on the rest of the file:
my $in_table = 0; 
while (my $line=<IN>){
    if ($in_table) {
        # process your columns here
        print "$line\n";
        my @columns = split(/\s+/,$line);
        ... # the rest of your processing
    }
    # read (and throw away) lines until you match the table header:
    $in_table = 1 if $line =~/Query HMM/;
    # next time through the while loop you'll have your 
    # first tabular data and the $in_table will be true
}

Upvotes: 0

bytepusher
bytepusher

Reputation: 1578

I would try to discard everything up to the header line ( or parse the first line ), and then begin parsing the lines after the header like so:

#!/usr/bin/env perl
use strict;
use warnings;

open (my $fh, "<", $hhr_report) or die "Cannot open $hhr_report: $!";

my $header;
do {
    $header = <$fh>;
    # If you need to parse lines before the header for some reason,
    # do that here
}while( !is_header($header) );

# If you like, parse the header column to get the column names

my @lines;

while ( my $line = <$fh> ){

    my @columns = split_line($line);
    push @lines, \@columns;

}

sub is_header {
    my $line = shift;

    return $line =~ /^No\sHit/ ? 1 : 0;
}

sub split_line {
    my $line = shift;
    # Here, use a regex to split the columns, depending on what you need. 
    # You could also consider outputting errors if the line is malformatted or missing important values

}

Upvotes: 1

Related Questions