abraham
abraham

Reputation: 743

perl extract multiples lines from hash

I have a tab delimited text file, something like this:

data    S1  S2  S3  S4  S5  S6
data1   0   0   0   0   0   0
data2   0   5   3   5   0.1 0.9
data3   0   3   9   3   0   0.01
data4   0   0   4   4   0   0
data5   2   5   11  7   5   0.2
data6   0   0   0   8.  0   0
data7   0   1   5   2   06  0.04

Well, the structure of the file is just little more complex, is a metagenomics file, something like:

D_0__Archaea;D_1__Euryarchaeota;D_2__Thermoplasmata;D_3__Thermoplasmatales;D_4__Marine Group II;D_5__uncultured archaeon 0 0 0 0 0 0 0 0 0.0035 0.00293 0.00834 0

from D_0__ to D_5__ = first column (data in the example) each number represent each column (S)

but at the end, is similar !!!!

what I want, is to use the %row hash to extract the first line (data) with a single @label_match (s3), and print out in a single txt file, I mean, if I want s3 and s6, print out something like this:

S3_file.txt (take the name of each column to print out the name of the file):

s3   data #avoid this line in the print out, just to explain !!!
0    data1
3    data2
9    data3
4    data4
11   data5
0    data6
5    data7

and

S6_file.txt:

0    data1
0.9  data2
0.01 data3
0    data4
0.2  data5
0    data6
0.04 data7

I have this code, and I think that in the %row section i have to make a foreach loop, to extract each @label_match one by one, but I don't know how. This is my code:

#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data::Dumper qw(Dumper);
use Getopt::Long;
use List::Util qw(sum);


my ($infile_taxon, $search_label, $output_file, $help, $help_desc, $options, $options_desc, $keep_file);

GetOptions (
    't=s'       =>\$infile_taxon,
    's=s'       =>\$search_label,
    'kf'        =>\$keep_file,
    'o=s'       =>\$output_file,
    'h'         =>\$help,
    'op'        =>\$options
);

#---------------------------Subrutin to clean the selected Taxon  --------------------
sub Taxon_Clean {
    my (@clean_result);
    foreach (@_){
        chomp;
        if ($_ =~ s/D_0__//g | s/;D_\d__/\t/g | s/;/\t/g){
            push @clean_result, $_;
        }
    }

    return @clean_result;
}

#------------------------------------------------------ Open Files-------------------
open INFILE_TAXONOMY, '<', "$infile_taxon" or die $!;

my (@taxon, @sample_names);

#------------------------------------------------------ Taxon -----------------------
my ( @header, @label_match, @not_match, @taxon_filter);
while (<INFILE_TAXONOMY>){
    chomp;
    if ($_=~ m/^$|Constructed from biom file/g)  {
        next;
    }
    elsif ($_=~ s/OTU ID/Taxon/g){
        chomp ( @header = split '\t', $_ );

#------------------------------------------------------ Search Label ----------------
        if ($search_label){
            my @label_wanted= split (/\,/, $search_label); 
            unshift @label_wanted, '#Taxon';
            @label_wanted = uniq (@label_wanted);
            foreach (@label_wanted){
                my $unit =$_;
                chomp $unit;
                if (my @match_wanted= grep (/$unit/, @header)){
                    push (@label_match, @match_wanted);
                }
                else {
                    push (@not_match, $unit);
                }
            }

#                                --------- Check Point ---------

            push (my @defined_elements, @label_match);
            shift @defined_elements;

            if (! @defined_elements){
                print "\n\tNON of the Search Samples \" $search_label \" "
                  . "Were Found in \" $infile_taxon \" File !!!\n\n";
                exit;
            }

            elsif (grep {defined($_)} @defined_elements){  
                if (grep {defined($_)} @not_match){
                    print "\n\tSamples No Found: @not_match\n\n";
                }
            }
        }
    }
    elsif ($_=~ m/^#/g){
        next;
    }

    elsif ($search_label) {  
        my %row;
        @row{@header} = split '\t'; 
        my @filter= join "\t", @row{@label_match}, "\n";
        push (@taxon_filter, @filter);
        #print Dumper (\%row);
    }
    else {
        push (@taxon, $_); 
    }
}


# The Next section is to extract all the wanted columns in a single file,
# but here is where I want to extract one by one column i a separate file !!!



open OUTPUT, '>', "Taxonomic_results_file.txt", or die "can't create the output file";

foreach (@taxon_filter){
    chomp $_;
    my ($tax, @values) = split '\t', $_;
    my $unit_val = join("\t", map { $_ } @values);
    my $sum_elements = sum (@values);
    if ($sum_elements == 0){
        next;
    }
    else {
        push (my @tx, $tax);
        @tx = Taxon_Clean (@tx);
        print OUTPUT "$unit_val\t@tx\n";
    }
}


close INFILE_TAXONOMY;
close OUTPUT;
exit;

Thanks So Much

Upvotes: 0

Views: 84

Answers (1)

Sobrique
Sobrique

Reputation: 53508

You're already a lot of the way there with the @row{@header} type syntax. That takes a hash-slice, which means you can match multiple elements based on hash keys.

Output works much the same

open ( my $s3_file, '>', 'S3_file.txt' ) or warn $!;
my @output_fields = qw ( s3 data ); #matches column headings

And lower down inside the %row block:

print {$s3_file} join ("\t", @row{@output_fields} )), "\n"; 

Upvotes: 1

Related Questions