user3032425
user3032425

Reputation: 53

Pattern matching with array in Perl

This is the script I am using to pattern match. I m not getting the exact output which I need please help me out..

#!/usr/bin/perl5.14.4
open(LIST, "/home/guest/Desktop/hpresult.txt") 
    or die ("Couldn't open the  Result");
@list = <LIST>;
close LIST;
open(OUTPUT, ">/home/guest/Desktop/sortresult3") 
    or die ("couldn't write the file");
$line = (@list);
foreach $line(@list) {
    if($line =~ m/>/g) {
        $pdbid = substr($line, 0);
    }
    if($line =~ m/Found/g) {
        $id = $line;
        print OUTPUT $pdbid . $id;
    }
}

INPUT

hpresult.txt  
>3ior_B  
Found PPPPPPPPPPP at 397 to 407 of length 11  
Found QQQQQQQQQ at 388 to 396 of length 9  

>3ior_C  
Found QQQQQQQQQQQQQ at 388 to 400 of length 13  

>3ios_A  

>3iot_A

OUTPUT (Which I m getting)

>3ior_B  
Found PPPPPPPPPPP at 397 to 407 of length 11  
>3ior_B  
Found QQQQQQQQQ at 388 to 396 of length 9  
>3ior_C  
Found QQQQQQQQQQQQQ at 388 to 400 of length 13  

Desired OUTPUT

>3ior_B  
Found PPPPPPPPPPP at 397 to 407 of length 11  
Found QQQQQQQQQ at 388 to 396 of length 9  

>3ior_C  
Found QQQQQQQQQQQQQ at 388 to 400 of length 13  

Please help me regarding this..

Upvotes: 1

Views: 5711

Answers (3)

Kenosis
Kenosis

Reputation: 6204

Your file has a fasta look to it, and it also appears that you're working with sequence positions/lengths.

Like fasta files, your file contains records separated by ">", so we can read your file in those 'chunks' by setting Perl's record separator $/ to ">", and then look for "Found" in those chunks. If "Found" is found, print the chunk:

use strict;
use warnings;

local $/ = '>';

while (<>) {
    chomp;
    print ">$_" if /Found/;
}

Usage: perl script.pl inFile >outFile

Output on your dataset:

>3ior_B  
Found PPPPPPPPPPP at 397 to 407 of length 11  
Found QQQQQQQQQ at 388 to 396 of length 9  

>3ior_C  
Found QQQQQQQQQQQQQ at 388 to 400 of length 13

Hope this helps!

Upvotes: 2

TLP
TLP

Reputation: 67900

Some notes on your code. When you've fixed these, you should have an entirely different program to deal with, and should perhaps ask a new question:

Always, always use

use strict;
use warnings;

Especially when new to Perl. strict will help you avoid confusion about scope and variable names (by forcing you to explicitly declare variables with my), amongst other things. warnings will warn you about things you are doing that might be unintentional. The time it takes you to learn to use these two pragmas you will get back later on in reduced debugging time, and having more control over your program.

open(LIST, "/home/guest/Desktop/hpresult.txt") 
    or die ("Couldn't open the  Result");
@list = <LIST>;
close LIST;
open(OUTPUT, ">/home/guest/Desktop/sortresult3") 
    or die ("couldn't write the file");

Here you open two file handles and slurp a file into an array. In a small program like this, it is -- in my opinion -- better to not hardcode input and output files, and instead use the diamond operator, and rely on shell redirection to save output to file. And slurping a file into an array is inefficient.

Here's the basic gist of it, replacing all of this file handling:

my $junk = <>;   # take first line away
while (<>) {     # reads the argument file names line-by-line
    # process lines here
}

If you do want to open files, you should use three argument open (with explicit MODE), and a lexical file handle:

open my $fh, "<", $file or die "Cannot open file for reading: $!";

This line:

$line = (@list);

is completely redundant, considering the following line, where you start the for loop. It will assign the last element of @list to $line, and in the next line, it will "overwrite" that value with a localized version. However, after the loop, $line will return to this value, which without a doubt will confuse you. See this question where they ask about localized variables.

I am not sure what you are trying to do here. I assume that you might be trying to take the first line in the file and remove it. If that is the case, you can simply do

shift @list;

But as you will see, since reading a file into an array is not the best solution, this is not something we will be using.

if($line =~ m/>/g) {
    $pdbid = substr($line, 0);
}
if($line =~ m/Found/g) {
    $id = $line;
    print OUTPUT $pdbid . $id;
}

As ikegami says, it is pointless to use the /g modifier an if statement. Also, substr($line, 0) will take a complete copy of the string $line. Not sure what you were trying to do there. But it is simpler (and less confusing) to just write $pdbid = $line in that case.

If you want your desired output, you will need to differentiate between different headers, perhaps by using a variable to remember the one you have printed

if ($line =~ /Found/) {
    print $pdbid if $printed_pdbid ne $pdbid;
    print $line;
    $printed_pdbid = $pdbid;
}

So, basically what you need is

use strict;
use warnings;

my $junk = <>;
my $old = "";                              # to avoid undef warning
my ($current, $pdbid);
while (<>) {
    if (/^>/) {                            # if line begins with >
        $pdbid = $_;                       # store header
    } elsif (/Found/) {                    # automatically skip to next line
        print $pdbid if $old ne $pdbid;
        $old = $pdbid;                     # store old header
        print $_;                          # print current line
    }
}

Which will give the following output:

>3ior_B
Found PPPPPPPPPPP at 397 to 407 of length 11
Found QQQQQQQQQ at 388 to 396 of length 9
>3ior_C
Found QQQQQQQQQQQQQ at 388 to 400 of length 13

You can also make use of paragraph mode, which involves changing the input record separator $/ to make Perl consider a line ending at two newlines \n\n:

my $junk = <>;          # before changing $/ reads single line
$/ = "\n\n";            # input record separator 
$\ = "\n\n";            # output record separator (for print())
while (<>) {            # read paragraph
    chomp;
    my ($hdr, @lines) = split /(?=\n)/;    # split paragraph
    print ($hdr, @lines) if @lines;        # if @lines is empty, skip
}

This is slightly untrue, in that true paragraph mode involves setting the input record separator to the empty string $/ = "", but in this case, since we are taking the newlines out and putting them back, it is better to be consistent.

Also note that since we split the paragraphs with a look-ahead assertion (?=...) we are not actually removing the newlines, but saving them for the print afterwards. We are however removing the paragraph newlines with chomp.

The usage of my programs listed here would be

perl script.pl input > output

And if you just want to see the output, skip the last part with redirection

perl script.pl input

Upvotes: 3

Toto
Toto

Reputation: 91373

Have a try with:

# ALWAYS
use strict;
use warnings;

my $filein = "/home/guest/Desktop/hpresult.txt";
my $fileout = "/home/guest/Desktop/sortresult3";
# use 3-arg open
open my $LIST, '<', $filein or die "Unable to open '$filein': $!";
open my $OUT, '>', $fileout or die "Unable to open '$fileout': $!";

my $id;
while(my $line = <$LIST>) {
    chomp $line;
    if ($line =~ />/) {
        $id = $line;
    } elsif ($line =~ /Found/) {
        print $OUT $id,"\n" if $id;
        # id is printed only once
        $id = '';
        print $OUT $line,"\n";
    }
}

Upvotes: 0

Related Questions