Reputation: 51
I am trying to divide a big file into different files containing single information for each variable inside the file.
my input file look like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PID008SM
...info here 1.....
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CL001-SC
....info here 2....
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CL001-SC
....info here 3....
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PID008SM
....info here 4....
In this case I would like to create two output file (one for PID008SM and CL001-SC) with the information related to each of them.
Output for CL001-SC:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CL001-SC
....info here 2...
....info here 3...
Output for PID008SM
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PID008SM
....info here 1....
....info here 4....
The script that I have used is in Perl but any suggestion it is more than welcome. Thank you in advance.
code:
#!/usr/bin/perl;
use strict;
use warnings;
my $file1 = $ARGV[0] ;
my $file2 = $ARGV[1];
open (F1, $file1); #Opens first .vcf file for comparison
open (F2, $file2); #2nd for comparison
my %file;
## Create the hash key with each line of the file2
while (<F2> ) {
#chomp;
$file{$_}='';
}
## Print the line , if key exist in the hash ;
foreach my $string (<F1>) {
if ( exists $file{$_}) and ($string =~ /(#)(.+?)(#)/s) {
print $string;
}
}
Upvotes: 0
Views: 148
Reputation: 126742
Something like this perhaps?
use strict;
use warnings;
open my $fh, '<', 'chrom.txt' or die $!;
my %fh;
while (<$fh>) {
if ( /^#CHROM/ ) {
my $name = (split)[-1];
if ($fh{$name}) {
select $fh{$name};
next;
}
my $file = "$name.txt";
open $fh{$name}, '>', $file or die qq{Unable to open "$file" for output: $!};
print STDOUT qq{Created file "$file"\n};
select $fh{$name};
}
print;
}
Upvotes: 1
Reputation: 2253
awk '/^#CHROM/{typ=$10;a[$0]++} a[$0]<2{print >> typ}' inputFile
this awk script seems to work +
Upvotes: 0