Peter
Peter

Reputation: 27

(Perl) Searching a file for text from another file

I've spent a few hours on this portion of my code and still have no idea how to make it work, so any suggestions would be great!


I have 2 files, list1.txt and dictionary.txt. list1.txt looks like

rs1
rs2
rs4
rs5

while dictionary.txt looks like

rs1 1 A G
rs2 2 C T
rs3 3 A A
rs4 4 G G

The columns are separated by a space - there are four columns. What I want to do is for each of the words in list1.txt, search dictionary.txt for the word, and if it exists, print out the entire row in dictionary.txt into a third file. If the word doesn't exist in dictionary.txt, just print out the word.

So, if I were to run the program below with the files listed above, my result should look like

rs1 1 A G
rs2 2 C T
rs4 4 G G
rs5

The aforementioned program:

open(LIST1, '<', 'test_chr1_22.txt') or die "Could not open chr1_22.txt: $!";

open(OUTPUT, '>', 'test_chr1_22_all_info.txt');

foreach my $line1 (<LIST1>)
{
        foreach my $line (@DICT)
        {
            if ($line =~ m/"$line1"/)
            {
                print OUTPUT"$line\n";
            }
        }
}

This is the code I have as of right now. I know it doesn't have my second condition, which is where if the word doesn't exist in dictionary, then just print the word. However, I can't even get the first part to work out, which is where if the word is in the dictionary, then print the row. What I get from this a blank text file. Anyone know what's going on?

Upvotes: 1

Views: 3184

Answers (1)

ikegami
ikegami

Reputation: 386631

m/"$line1"/ is wrong for numerous reasons:

  • None of your strings against which you match contain ", so this will never match.
  • You don't escape the contents of $line1 to form a regular expression from arbitrary text.
  • You only want to match if the text is found at the beginning of the string.
  • You only want to match if the text is the entire field.

Anyway, once you use replace the extremely inefficient nested loops with a loop and a hash lookup, the need for regex match disappears.

my %dict;
while (<$DICT>) {
   my ($key) = split;
   $dict{$key} = $_;
}

while (<$INPUT>) {
   my ($key) = split;
   print $dict{$key} // $_;
}

Upvotes: 3

Related Questions