Reputation: 743
I have a tab delimited text file, something like this:
data S1 S2 S3 S4 S5 S6
data1 0 0 0 0 0 0
data2 0 5 3 5 0.1 0.9
data3 0 3 9 3 0 0.01
data4 0 0 4 4 0 0
data5 2 5 11 7 5 0.2
data6 0 0 0 8. 0 0
data7 0 1 5 2 06 0.04
Well, the structure of the file is just little more complex, is a metagenomics file, something like:
D_0__Archaea;D_1__Euryarchaeota;D_2__Thermoplasmata;D_3__Thermoplasmatales;D_4__Marine Group II;D_5__uncultured archaeon 0 0 0 0 0 0 0 0 0.0035 0.00293 0.00834 0
from D_0__ to D_5__ = first column (data in the example) each number represent each column (S)
but at the end, is similar !!!!
what I want, is to use the %row hash to extract the first line (data) with a single @label_match (s3), and print out in a single txt file, I mean, if I want s3 and s6, print out something like this:
S3_file.txt (take the name of each column to print out the name of the file):
s3 data #avoid this line in the print out, just to explain !!!
0 data1
3 data2
9 data3
4 data4
11 data5
0 data6
5 data7
and
S6_file.txt:
0 data1
0.9 data2
0.01 data3
0 data4
0.2 data5
0 data6
0.04 data7
I have this code, and I think that in the %row section i have to make a foreach loop, to extract each @label_match one by one, but I don't know how. This is my code:
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data::Dumper qw(Dumper);
use Getopt::Long;
use List::Util qw(sum);
my ($infile_taxon, $search_label, $output_file, $help, $help_desc, $options, $options_desc, $keep_file);
GetOptions (
't=s' =>\$infile_taxon,
's=s' =>\$search_label,
'kf' =>\$keep_file,
'o=s' =>\$output_file,
'h' =>\$help,
'op' =>\$options
);
#---------------------------Subrutin to clean the selected Taxon --------------------
sub Taxon_Clean {
my (@clean_result);
foreach (@_){
chomp;
if ($_ =~ s/D_0__//g | s/;D_\d__/\t/g | s/;/\t/g){
push @clean_result, $_;
}
}
return @clean_result;
}
#------------------------------------------------------ Open Files-------------------
open INFILE_TAXONOMY, '<', "$infile_taxon" or die $!;
my (@taxon, @sample_names);
#------------------------------------------------------ Taxon -----------------------
my ( @header, @label_match, @not_match, @taxon_filter);
while (<INFILE_TAXONOMY>){
chomp;
if ($_=~ m/^$|Constructed from biom file/g) {
next;
}
elsif ($_=~ s/OTU ID/Taxon/g){
chomp ( @header = split '\t', $_ );
#------------------------------------------------------ Search Label ----------------
if ($search_label){
my @label_wanted= split (/\,/, $search_label);
unshift @label_wanted, '#Taxon';
@label_wanted = uniq (@label_wanted);
foreach (@label_wanted){
my $unit =$_;
chomp $unit;
if (my @match_wanted= grep (/$unit/, @header)){
push (@label_match, @match_wanted);
}
else {
push (@not_match, $unit);
}
}
# --------- Check Point ---------
push (my @defined_elements, @label_match);
shift @defined_elements;
if (! @defined_elements){
print "\n\tNON of the Search Samples \" $search_label \" "
. "Were Found in \" $infile_taxon \" File !!!\n\n";
exit;
}
elsif (grep {defined($_)} @defined_elements){
if (grep {defined($_)} @not_match){
print "\n\tSamples No Found: @not_match\n\n";
}
}
}
}
elsif ($_=~ m/^#/g){
next;
}
elsif ($search_label) {
my %row;
@row{@header} = split '\t';
my @filter= join "\t", @row{@label_match}, "\n";
push (@taxon_filter, @filter);
#print Dumper (\%row);
}
else {
push (@taxon, $_);
}
}
# The Next section is to extract all the wanted columns in a single file,
# but here is where I want to extract one by one column i a separate file !!!
open OUTPUT, '>', "Taxonomic_results_file.txt", or die "can't create the output file";
foreach (@taxon_filter){
chomp $_;
my ($tax, @values) = split '\t', $_;
my $unit_val = join("\t", map { $_ } @values);
my $sum_elements = sum (@values);
if ($sum_elements == 0){
next;
}
else {
push (my @tx, $tax);
@tx = Taxon_Clean (@tx);
print OUTPUT "$unit_val\t@tx\n";
}
}
close INFILE_TAXONOMY;
close OUTPUT;
exit;
Thanks So Much
Upvotes: 0
Views: 84
Reputation: 53508
You're already a lot of the way there with the @row{@header}
type syntax. That takes a hash-slice, which means you can match multiple elements based on hash keys.
Output works much the same
open ( my $s3_file, '>', 'S3_file.txt' ) or warn $!;
my @output_fields = qw ( s3 data ); #matches column headings
And lower down inside the %row
block:
print {$s3_file} join ("\t", @row{@output_fields} )), "\n";
Upvotes: 1