Implementing This Algorithm with Less line of Code at Perl

Question

I want to implement this algorithm in Perl. Let's accept that:

DNA1 = GACTAGGC
DNA2 = AGCTAGGA

enter image description here

First element of DNA1 is G, we will find if there is G at DNA2 and point it with a dot. We continue it till the end so the image shows evey same element intersections as a dot.

The next step is that: connecting points. To point to dots first one should be at left top corner of a small square and the other one at right bottom(I mean lines should have a 135 degree) If stringency is 2, it means that reject the lines which has occured from 2 and less then 2 dots(It means that if stringency was 3, there would be just one line at image).

Last step is that: wordcount. If wordcount is 1(it is one at image) it means that compare elements one by one. If it was 3 it means that compare 3 of them together. You can write a program that wordcount is 1 because it is always 1.

I searched about it and this is what I have:

$infile1 = "DNA1.txt";
$infile2 = "DNA2.txt";

$outfile = "plot.txt";
$wordsize=0;
$stringency=0;

open inf, $infile1 or die "STOP! File $infile1 not found.
";
$sequence1=;
chomp $sequence1;
@seq1=split //,$sequence1;
close inf;

open inf, $infile2 or die "STOP! File $infile2 not found.
";
$sequence2=;
chomp $sequence2;
@seq2=split //,$sequence2;
close inf;

$Lseq1=$#seq1+1;
$Lseq2=$#seq2+1;

open ouf, ">$outfile";

for ($i=0;$i<$Lseq1;$i++){
print ouf "
";
for ($j=0;$j<$Lseq2;$j++){
  $match=0;
  for ($w=0;$w<=$wordsize;$w++){
    if($seq1[$i+$w] eq $seq2[$j+$w]){
      $match++;
    }
  }
  if($match > $stringency){
     print ouf "1";
  }
  else{
     print ouf "0";
  }
}
}

Can you check it about errors and how can I optimize my code with more less code at Perl?

PS: I think it is OK to accept $wordsize equals to $stringency every time.

EDIT 1: I have edited my code and it works for just puting dot.

EDIT 2: Algorithm is like that:

qseq, sseq = sequences
win = number of elements to compare for each point
Strig = number of matches required for a point

for each q in qseq:
  for each s in sseq:
    if CompareWindow(qseq[q:q+win], s[s:s+win], strig):
      AddDot(q, s)

EDIT 3: Here is a better algorithm suggestion:

osl.iu.edu/~chemuell/projects/bioinf/dotplot.ppt

Any idea to improve code according to that better algorithm is welcome.

ikegami · Accepted Answer

First, the innermost for loop is totally unnecessary. Getting rid of it will speed up your code.

Secondly, your code doesn't work for $stringency other than 0.

Fix:

use strict;
use warnings;

my $s1 = 'GACTAGGC';
my $s2 = 'AGCTAGGA';
my $stringency = 0;

my @s1 = split //, $s1;
my @s2 = split //, $s2;
my @L;
for my $i (0..$#s1) {
   for my $j (0..$#s2) {
      if ($s1[$i] ne $s2[$j]) {
         $L[$i][$j] = 0;
      } elsif ($i == 0 || $j == 0) {
         $L[$i][$j] = 1;
      } else {
         $L[$i][$j] = $L[$i-1][$j-1] + 1;
      }

      print $L[$i][$j] <= $stringency ? "0" : "1";
   }
   print("
");
}

Now that we have an efficient algorithm, we can tweak the implementation.

use strict;
use warnings;

my $s1 = 'GACTAGGC';
my $s2 = 'AGCTAGGA';
my $stringency = 0;

my @s1 = split //, $s1;
my @s2 = split //, $s2;
my @L = (0) x @s2;
for my $i (0..$#s1) {
   for my $j (0..$#s2) {
      if ($s1[$i] eq $s2[$j]) {
         ++$L[$j];
      } else {
         $L[$j] = 0;
      }

      print $L[$j] <= $stringency ? "0" : "1";
   }

   print("
");
   pop @L;
   unshift @L, 0;
}

If you want a better idea of what's going on, replace

print $L[$j] <= $stringency ? "0" : "1";

with

print $L[$j];

You'll get something like

By the way, what are trying to achieve is remarkably similar to finding the longest common substring.

Update To get $s1 and $s2 from files, one line at a time,

open(my $fh1, '<', ...) or die(...);
open(my $fh2, '<', ...) or die(...);

for (;;) {
    my $s1 = <$fh1>;
    my $s2 = <$fh2>;

    die("Files have different length
")
        if defined($s1) && !defined($s2)
        || !defined($s1) && defined($s2);

    last if !defined(($s1);

    chomp($s1);
    chomp($s2);

    ... code to generate graph ...
}

Implementing This Algorithm with Less line of Code at Perl

Answers (1)

Related Questions