Reputation: 633
I'm trying to improve my script in which I hope to match characters in input.txt
(column 4: H1
, 2HB
, CA
, HB3
) to dictionary.txt
and replace with appropriate characters from dictionary.txt
(column 2: H
, HB
, C
, 3HB
). Using dictionary.txt
as a dictionary:
input.txt
1 N 22 H1 MET
1 H 32 2HB MET
1 C 40 CA MET
2 H 35 HB3 ASP
dictionary.txt
MET H H1
MET HB 2HB
MET C CA
ASP 3HB HB3
output
1 N 22 H MET
1 H 32 HB MET
1 C 40 C MET
2 H 35 3HB ASP
I'm trying to approach this by first matching the word in input.txt
(MET) and dictionary.txt
(MET) and then performing the substitution. This is what I've written so far:
#!/usr/bin/perl
use strict;
use warnings;
my %dictionary;
open my $dic_fh, '<', 'dictionary.txt' or die "Can't open file: $!";
while (my $ref = <$dic_fh>) {
chomp $ref;
my @columns = split(/\t/, $ref);
my $res_name = $columns[0];
my $ref_nuc = $columns[1];
$dictionary{$res_name} = {$ref_nuc};
open my $in_fh, '<', 'input.txt' or die "Can't open file: $!";
while (my $line = <$in_fh>) {
chomp $line;
my @columns = split(/\t/, $line);
my @name = $columns[3];
if (my $name eq $res_name) {
my $line = $_;
foreach my $res_name (keys %dictionary) {
$line =~ s/$name/$dictionary{$ref_nuc}/;
}
print $line;
}
}
}
Upvotes: 1
Views: 118
Reputation: 126722
The problem seems to be that you are assigning the single field $columns[3]
to array @name
, and then expecting to find it in $name
, which is a separate variable altogether. You even declare $name
at the point of the comparison
You are also executing the statement
$line =~ s/$name/$dictionary{$ref_nuc}/;
once for each key in the hash. That is unnecessary: it needs to be done only once. It is also better to change the value of $columns[3]
to $dictionary{$columns[3]}
instead of doing a search and replace on the whole line, as the target string may appear in other columns that you don't want to modify
It is very simple to do by building a dictionary hash and replacing the fourth field of the input file with its dictionary lookup
use strict;
use warnings;
use 5.010;
use autodie;
open my $fh, '<', 'dictionary.txt';
my %dict;
while ( <$fh> ) {
my ($k, $v) = (split)[2,1];
$dict{$k} = $v;
}
open $fh, '<', 'input.txt';
while ( <$fh> ) {
my @fields = split;
$fields[3] = $dict{$fields[3]};
say join "\t", @fields;
}
output
1 N 22 H MET
1 H 32 HB MET
1 C 40 C MET
2 H 35 3HB ASP
Upvotes: 3