kamaci
kamaci

Reputation: 75237

Implementing This Algorithm with Less line of Code at Perl

I want to implement this algorithm in Perl. Let's accept that:

enter image description here

First element of DNA1 is G, we will find if there is G at DNA2 and point it with a dot. We continue it till the end so the image shows evey same element intersections as a dot.

The next step is that: connecting points. To point to dots first one should be at left top corner of a small square and the other one at right bottom(I mean lines should have a 135 degree) If stringency is 2, it means that reject the lines which has occured from 2 and less then 2 dots(It means that if stringency was 3, there would be just one line at image).

Last step is that: wordcount. If wordcount is 1(it is one at image) it means that compare elements one by one. If it was 3 it means that compare 3 of them together. You can write a program that wordcount is 1 because it is always 1.

I searched about it and this is what I have:

$infile1 = "DNA1.txt";
$infile2 = "DNA2.txt";

$outfile = "plot.txt";
$wordsize=0;
$stringency=0;

open inf, $infile1 or die "STOP! File $infile1 not found.\n";
$sequence1=<inf>;
chomp $sequence1;
@seq1=split //,$sequence1;
close inf;

open inf, $infile2 or die "STOP! File $infile2 not found.\n";
$sequence2=<inf>;
chomp $sequence2;
@seq2=split //,$sequence2;
close inf;

$Lseq1=$#seq1+1;
$Lseq2=$#seq2+1;

open ouf, ">$outfile";

for ($i=0;$i<$Lseq1;$i++){
print ouf "\n";
for ($j=0;$j<$Lseq2;$j++){
  $match=0;
  for ($w=0;$w<=$wordsize;$w++){
    if($seq1[$i+$w] eq $seq2[$j+$w]){
      $match++;
    }
  }
  if($match > $stringency){
     print ouf "1";
  }
  else{
     print ouf "0";
  }
}
}

Can you check it about errors and how can I optimize my code with more less code at Perl?

PS: I think it is OK to accept $wordsize equals to $stringency every time.

EDIT 1: I have edited my code and it works for just puting dot.

EDIT 2: Algorithm is like that:

qseq, sseq = sequences
win = number of elements to compare for each point
Strig = number of matches required for a point

for each q in qseq:
  for each s in sseq:
    if CompareWindow(qseq[q:q+win], s[s:s+win], strig):
      AddDot(q, s)

EDIT 3: Here is a better algorithm suggestion:

osl.iu.edu/~chemuell/projects/bioinf/dotplot.ppt

Any idea to improve code according to that better algorithm is welcome.

Upvotes: 0

Views: 326

Answers (1)

ikegami
ikegami

Reputation: 386461

First, the innermost for loop is totally unnecessary. Getting rid of it will speed up your code.

Secondly, your code doesn't work for $stringency other than 0.

Fix:

use strict;
use warnings;

my $s1 = 'GACTAGGC';
my $s2 = 'AGCTAGGA';
my $stringency = 0;

my @s1 = split //, $s1;
my @s2 = split //, $s2;
my @L;
for my $i (0..$#s1) {
   for my $j (0..$#s2) {
      if ($s1[$i] ne $s2[$j]) {
         $L[$i][$j] = 0;
      } elsif ($i == 0 || $j == 0) {
         $L[$i][$j] = 1;
      } else {
         $L[$i][$j] = $L[$i-1][$j-1] + 1;
      }

      print $L[$i][$j] <= $stringency ? "0" : "1";
   }
   print("\n");
}

Now that we have an efficient algorithm, we can tweak the implementation.

use strict;
use warnings;

my $s1 = 'GACTAGGC';
my $s2 = 'AGCTAGGA';
my $stringency = 0;

my @s1 = split //, $s1;
my @s2 = split //, $s2;
my @L = (0) x @s2;
for my $i (0..$#s1) {
   for my $j (0..$#s2) {
      if ($s1[$i] eq $s2[$j]) {
         ++$L[$j];
      } else {
         $L[$j] = 0;
      }

      print $L[$j] <= $stringency ? "0" : "1";
   }

   print("\n");
   pop @L;
   unshift @L, 0;
}

If you want a better idea of what's going on, replace

print $L[$j] <= $stringency ? "0" : "1";

with

print $L[$j];

You'll get something like

01000110
10001002
00100000
00020000
10003001
02000410
01000150
00200000

By the way, what are trying to achieve is remarkably similar to finding the longest common substring.

Update To get $s1 and $s2 from files, one line at a time,

open(my $fh1, '<', ...) or die(...);
open(my $fh2, '<', ...) or die(...);

for (;;) {
    my $s1 = <$fh1>;
    my $s2 = <$fh2>;

    die("Files have different length\n")
        if defined($s1) && !defined($s2)
        || !defined($s1) && defined($s2);

    last if !defined(($s1);

    chomp($s1);
    chomp($s2);

    ... code to generate graph ...
}

Upvotes: 4

Related Questions