Carmen Sandoval
Carmen Sandoval

Reputation: 2356

Print each hash key and its values in new file?

I have made a hash of hashes, where all the lines of a file are sorted into a key of the "master" hash depending on the value of their 5th field.

%Tiles has n keys, where each key is a different $Tile_Number.

The value of each element of %Tiles is a reference to a hash hash that contains all lines whose $Tile_Number was the number of the current hash key. The value of each of these new keys (lines) is just 1.

$Tiles{$Tile_Number}{$Line}=1 , where $Tiles{$Tile_Number} has many $Line=1 entries.

I want to print each $Tiles{$Tile_Number} hash in a separate file, preferably, creating the file upon the creation of the $Tile_Number key, and printing as each new $Tiles{$Tile_Number}{$Line}=1 is added, to save memory.

The best would be to not print the final value (1), but I can do away with this, I guess.

How can I tell Perl to open a new file for each key in the "master" hash and print all of its keys?

Code:

use strict;
use warnings;


my ($Line) = "";
my (@Alignment_Line) = ();
my (%Tiles) = ();

my $Huge_BAM_File= $ARGV[0] or die $USAGE;

open(HUGE_BAM_FILE,"< $Huge_BAM_File") || die "Sorry I couldn't open the INPUT file:   $Huge_BAM_File !\n";

while(<HUGE_BAM_FILE>){

    ### Remove new line characters "\n"
    ### Split each line by "\t" and by ":" (for fields within READ ID FIELD)
    chomp;
    $Line = $_;
    @Alignment_Line = split(/\t+|\:/, $Line);

    my $Tile_Number = $Alignment_Line[4]


    ##########################################################
    ### Fill in hash of hashes %Tiles                      ###
    ### Key = $Tile_Number                                 ###
    ### Second key is $Line                    ###
    ### and is filled with a 1                     ###  
    ### Each key contains all the alignments with that tile### 
    ### number                                     ###
    ##########################################################

     $Tiles{$Tile_Number}{$Line} = 1;
     ##Here, I would like to write this new entry into the corresponding file, 
     and maybe remove it from the hash so the program doesn't run out of memory.
}

close(HUGE_BAM_FILE); close(ALL_OUTPUTS_GENERATED);

Upvotes: 0

Views: 3527

Answers (1)

Borodin
Borodin

Reputation: 126722

I think you should have a hash of arrays, not a hash of hashes. However it sounds like you can print out your hashes using this

while (my ($tile, $lines) = each %Tiles) {
    open my $fh, '>', "$tile.txt" or die $!;
    print $fh $_ for keys %$lines;
}

Note that the lines won't be in the same order they were read. You would have to use an array for that.

I'm not clear about your idea of printing as each line is added and saving memory. Do you mean you want to print each line instead of adding it to the hash? Perhaps you should show us your complete code.


Update

Here's an alternative you may like. It doesn't store the data from the file at all. Instead it extracts the tile number from each line as it reads it, and writes to the file corresponding to that number.

There is a hash of filehandles that has the tile numbers as keys, and each time a line is read the hash is checked to see if there is already a filehandle for that tile number. If not then a new one is opened before writing the line.

use strict;
use warnings;

my $USAGE;

my $bam_file = $ARGV[0] or die $USAGE;

open my $bam, '<', $bam_file"
    or die qq{Unable to open "$bam_file" for input: $!};

my %filehandles;

while (<$bam>) {
    chomp ($line = $_);
    my @fields = split /[\t:]/, $line;
    my $tile = $fields[4];
    unless ($filehandles{$tile}) {
      my $file = "$tile.txt";
      open $filehandles{$tile}, '>', $file
          or die qq{Unable to open "$file" for output: $!};
    }
    print $filehandles{$tile} $_;
}

while (my ($tile, $fh) = each %filehandles) {
  close $fh
      or warn qq{Unable to close file for tile number $tile: $!};
}

Upvotes: 2

Related Questions