Reputation: 2331
The details of what the script is doing isn't important, but I have put comments in what seem like the important lines to me, I'm only concerned with why I am getting blank lines in my output
When I run the command
./script.pl temp temp.txt tempF `wc -l temp | awk '{print $1}'`
The temp file contains
1 27800000 120700000 4
1 27800000 124300000 4
1 154800000 247249719 3
3 32100000 71800000 9
3 32100000 87200000 2
3 54400000 74200000 15
4 76500000 155100000 20
4 76500000 182600000 3
4 76500000 88200000 77
4 88200000 124000000 2
5 58900000 180857866 8
5 58900000 76400000 2
5 58900000 97300000 4
5 76400000 143100000 14
5 97300000 147200000 6
6 7000000 29900000 2
6 63500000 70000000 73
6 63500000 92100000 4
6 70000000 113900000 70
6 70000000 139100000 57
6 92100000 113900000 3
And I am getting output of the form
hs1 27800000 124300000 4
hs3 32100000 87200000 2
hs3 54400000 74200000 15
hs4 76500000 182600000 3
hs4 76500000 88200000 77
hs4 88200000 124000000 2
hs5 58900000 76400000 2
hs5 58900000 97300000 4
hs5 76400000 143100000 14
hs5 97300000 147200000 6
hs6 63500000 92100000 4
hs6 70000000 139100000 57
hs6 92100000 113900000 3
To standard output (about 8 of the lines are also printed to the temp.txt file but the formatting of those ones is correct)
This is the script below
#!/usr/bin/perl
# ARGV[0] is the name of the file which data will be read from(may have overlaps)
# ARGV[1] is the name of the file which will be produced that will have no overlaps
# ARGV[2] is the name of the folder which will hold all the data
# ARGV[3] is the number of lines that ARGV[0] will contain
use warnings;
my $file = "./$ARGV[0]";
my @lines = do {
open my $fh, '<', $file or die "Can't open $file -- $!";
<$fh>;
};
my $file2 = "./$ARGV[2]/$ARGV[1]";
open( my $files, ">", "$file2" ) or die "Can't open > $file2: $!";
my $i = 0;
while ( $i < $ARGV[3] - 1 ) {
my @ref_fields = split( '\s+', $lines[$i] );
print $files
"$ref_fields[0]", "\t",
$ref_fields[1], "\t",
$ref_fields[2], "\t",
$ref_fields[3], "\n";
for my $j ( $i + 1 .. $ARGV[3] - 1 ) {
$i = $j;
# @curr_fields is initialized here
my @curr_fields = split /\s+/, $lines[$j];
if ( $ref_fields[0] eq $curr_fields[0] && $ref_fields[2] > $curr_fields[1] ) {
if ( defined( $curr_fields[0] ) && $curr_fields[0] !~ /\s+/ ) {
chomp $curr_fields[3];
# the line below is the one that is printing to standard output
print
$curr_fields[0], "\t",
$curr_fields[1], "\t",
$curr_fields[2], "\t",
$curr_fields[3], "\n";
}
}
else {
last;
}
}
print "\n";
}
.
.
Edit:
I notice an error when running the script from the answer posted When I run the command
./script.pl temp1 temp10.txt folder
Where temp1 contains
12 58100000 96200000 0.04348
3 74200000 87200000 0.04348
5 130600000 168500000 0.04348
6 61000000 114600000 0.04348
6 75900000 114600000 0.04348
6 88000000 114600000 0.04348
6 88000000 139000000 0.04348
6 93100000 161000000 0.04348
6 105500000 139000000 0.04348
6 130300000 139000000 0.04348
7 59900000 77500000 0.04348
7 98000000 132600000 0.04348
X 67800000 76000000 0.08696
Y 28800000 59373566 0.04348
I get
6 75900000 114600000 0.04348
6 88000000 114600000 0.04348
6 88000000 139000000 0.04348
6 93100000 161000000 0.04348
6 105500000 139000000 0.04348
And temp10.txt contains
12 58100000 96200000 0.04348
3 74200000 87200000 0.04348
5 130600000 168500000 0.04348
6 61000000 114600000 0.04348
6 130300000 139000000 0.04348
7 59900000 77500000 0.04348
7 98000000 132600000 0.04348
X 67800000 76000000 0.08696
The line
Y 28800000 59373566 0.04348
Is neither in the output or temp10.txt. It seems to have dissappeared but should have printed to one of these
Upvotes: 0
Views: 710
Reputation: 126722
It seems obvious that the blank lines are printing because you have a line
print "\n";
in your code
I can't help much more because you say "The details of what the script is doing isn't important", and so withhold from us what it's meant to be doing
However, what you have written prints lines from the input file as long as the first column matches the first column in the previous line and the second field is less than the third field in the previous line. Any time you get a line that doesn't qualify in this way you are printing a blank line
You may prefer this refactoring of your code, which behaves identically but I think is much more readable. It also has the advantage of splitting each of the lines from the input file only once, and there is no need for the fourth parameter as the number of lines is simply the size of the @lines
array. Blank lines are removed from the file as they are read, so there's no longer a need for your check on the definedness of the first field
#!/usr/bin/perl
# ARGV[0] is the name of the file which data will be read from (may have overlaps)
# ARGV[1] is the name of the file which will be produced that will have no overlaps
# ARGV[2] is the name of the folder which will hold all the circos data file (mitelmanAll, mitelmanProstate, etc.)
use strict;
use warnings 'all';
use File::Path 'make_path';
use File::Spec::Functions 'catfile';
my ($file, $newfile, $dir) = @ARGV;
$newfile = catfile($dir, $newfile);
my @lines = do {
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
map { [ split ] } grep /\S/, <$fh>;
};
make_path($dir);
open my $out_fh, '>', $newfile or die qq{Unable to open "$newfile" for output: $!};
for ( my $i = 0; $i < $#lines; ) {
my $ref_fields = $lines[$i];
print $out_fh join("\t", @$ref_fields[0..3]), "\n";
for my $j ( $i + 1 .. $#lines ) {
$i = $j;
my $curr_fields = $lines[$j];
last unless $curr_fields->[0] == $ref_fields->[0];
last unless $curr_fields->[1] < $ref_fields->[2];
print join("\t", @$curr_fields[0..3]), "\n";
}
}
Upvotes: 2