EA00
EA00

Reputation: 633

Perl: Search and Replace

I'm trying to improve my script in which I hope to match characters in input.txt (column 4: H1, 2HB, CA, HB3) to dictionary.txt and replace with appropriate characters from dictionary.txt (column 2: H, HB, C, 3HB). Using dictionary.txt as a dictionary:

input.txt

1  N  22  H1  MET
1  H  32 2HB  MET
1  C  40  CA  MET
2  H  35  HB3 ASP

dictionary.txt

MET  H   H1
MET  HB 2HB
MET  C   CA
ASP 3HB  HB3

output

1  N  22  H  MET
1  H  32  HB MET
1  C  40  C  MET
2  H  35 3HB ASP

I'm trying to approach this by first matching the word in input.txt (MET) and dictionary.txt (MET) and then performing the substitution. This is what I've written so far:

#!/usr/bin/perl

use strict;
use warnings;

my %dictionary;

open my $dic_fh, '<', 'dictionary.txt' or die "Can't open file: $!";

while (my $ref = <$dic_fh>) {
    chomp $ref;
    my @columns  = split(/\t/, $ref);
    my $res_name = $columns[0];
    my $ref_nuc  = $columns[1];
    $dictionary{$res_name} = {$ref_nuc};

    open my $in_fh, '<', 'input.txt' or die "Can't open file: $!";

    while (my $line = <$in_fh>) {
        chomp $line;
        my @columns = split(/\t/, $line);
        my @name = $columns[3];
        if (my $name eq $res_name) {
            my $line = $_;
            foreach my $res_name (keys %dictionary) {
                $line =~ s/$name/$dictionary{$ref_nuc}/;
            }
            print $line;
        }
    }
}

Upvotes: 1

Views: 118

Answers (1)

Borodin
Borodin

Reputation: 126722

The problem seems to be that you are assigning the single field $columns[3] to array @name, and then expecting to find it in $name, which is a separate variable altogether. You even declare $name at the point of the comparison

You are also executing the statement

$line =~ s/$name/$dictionary{$ref_nuc}/;

once for each key in the hash. That is unnecessary: it needs to be done only once. It is also better to change the value of $columns[3] to $dictionary{$columns[3]} instead of doing a search and replace on the whole line, as the target string may appear in other columns that you don't want to modify

It is very simple to do by building a dictionary hash and replacing the fourth field of the input file with its dictionary lookup

use strict;
use warnings;
use 5.010;
use autodie;

open my $fh, '<', 'dictionary.txt';
my %dict;
while ( <$fh> ) {
  my ($k, $v) = (split)[2,1];
  $dict{$k} = $v;
}

open $fh, '<', 'input.txt';
while ( <$fh> ) {
  my @fields = split;
  $fields[3] = $dict{$fields[3]};
  say join "\t", @fields;
}

output

1   N   22  H   MET
1   H   32  HB  MET
1   C   40  C   MET
2   H   35  3HB ASP

Upvotes: 3

Related Questions