user1987607
user1987607

Reputation: 2157

if non-empty file compare columns and write matching lines to new file

I want to write a script that does the following: the user can choose to import a .txt file (for this I have written the code)(here $list1). This file consists of only one column with names on each line. If the user imported a file which is not empty, than I want to compare the names from a column from another file (here $file2) with the names in the imported file. I there is a match, then the whole line of this original file ($file2) should be placed in a new file ($filter1).

This is what I have so far:

my $list1;

if (prompt_yn("Do you want to import a genelist for filtering?")){
    my $genelist1 = prompt("Give the name of the first genelist file:\n");
    open($list1,'+<',$genelist1) or die "Could not open file $genelist1 $!";
}

open(my $filter1,'+>',"filter1.txt") || die "Can't write new file: $!";

my %hash1=();
while(<$list1>){ # $list1 is the variable from the imported .txt file
    chomp;    
    next unless -z $_;
    my $keyfield= $_; # this imported file contains only one column
    $hash1{$keyfield}++;
}

seek $file2,0,0; #cursor resetting
while(<$file2>){ # this is the other file with multiple columns
    my @line=split(/\t/);  # split on tabs
    my $keyfield=$line[2]; # values to compare are in column 3       
    if (exists($hash1{$keyfield})){
    print $filter1 $_;
    }       
}   

When running this script my output filter1.txt is empty. Which is not correct because there are definitely matches between the columns.

Upvotes: 0

Views: 188

Answers (1)

Vorsprung
Vorsprung

Reputation: 34387

Because you have declared the $list1 filehandle as a lexical ( "my" ) variable inside a block, it is only visible in that block.

So the later lines in your script can't see $list1 and it gives the error message mentioned

To fix this, declare $list1 before the if.. block that opens the file

As the script stands, doesn't set keys or values in %hash1

Your spec is fuzzy, but what you might be intending is loading hash1 keys from file1

while(<$list1>){            # $list1 is the variable from the imported .txt file
    chomp;                  # remove newlines
    my $keyfield=$_;        # this imported file contains only one column
    $hash1{$keyfield}++;    # this sets a key in %hash1
    }

Then when going through file2

while(<$file2>){                      # this is the other file with multiple columns
    my @line=split(/\t/);             # split on tabs
    my $keyfield=$line[2];            # values to compare are in column "2"      
    if (exists($hash1{$keyfield}) ){  # do hash lookup for exact match
        print $_;                     # output entire line
    } 

Incidentally, $line[2] is actually column 3, the first column is $line[0], the second $line[1] etc

If you actually want to do a partial or pattern match (like a grep) then using a hash isn't appropriate

Finally, you will have to amend the print $_; # output entire line to output to a file, if this is what you require. I removed the reference to $filter1 as this isn't declared in the script fragment shown

Upvotes: 1

Related Questions