Sam
Sam

Reputation: 2331

Why are blank lines printing in my perl script print output

The details of what the script is doing isn't important, but I have put comments in what seem like the important lines to me, I'm only concerned with why I am getting blank lines in my output

When I run the command

./script.pl temp temp.txt tempF `wc -l temp | awk '{print $1}'`

The temp file contains

1   27800000    120700000   4
1   27800000    124300000   4
1   154800000   247249719   3
3   32100000    71800000    9
3   32100000    87200000    2
3   54400000    74200000    15
4   76500000    155100000   20
4   76500000    182600000   3
4   76500000    88200000    77
4   88200000    124000000   2
5   58900000    180857866   8
5   58900000    76400000    2
5   58900000    97300000    4
5   76400000    143100000   14
5   97300000    147200000   6
6   7000000 29900000    2
6   63500000    70000000    73
6   63500000    92100000    4
6   70000000    113900000   70
6   70000000    139100000   57
6   92100000    113900000   3

And I am getting output of the form

hs1 27800000    124300000   4


hs3 32100000    87200000    2
hs3 54400000    74200000    15

hs4 76500000    182600000   3
hs4 76500000    88200000    77
hs4 88200000    124000000   2

hs5 58900000    76400000    2
hs5 58900000    97300000    4
hs5 76400000    143100000   14
hs5 97300000    147200000   6


hs6 63500000    92100000    4

hs6 70000000    139100000   57
hs6 92100000    113900000   3

To standard output (about 8 of the lines are also printed to the temp.txt file but the formatting of those ones is correct)

This is the script below

#!/usr/bin/perl

# ARGV[0] is the name of the file which data will be read from(may have overlaps)
# ARGV[1] is the name of the file which will be produced that will have no overlaps
# ARGV[2] is the name of the folder which will hold all the data  
# ARGV[3] is the number of lines that ARGV[0] will contain

use warnings;

my $file  = "./$ARGV[0]";
my @lines = do {
    open my $fh, '<', $file or die "Can't open $file -- $!";
    <$fh>;
};

my $file2 = "./$ARGV[2]/$ARGV[1]";
open( my $files, ">", "$file2" ) or die "Can't open > $file2: $!";

my $i = 0;
while ( $i < $ARGV[3] - 1 ) {

    my @ref_fields = split( '\s+', $lines[$i] );

    print $files
        "$ref_fields[0]", "\t",
        $ref_fields[1], "\t",
        $ref_fields[2], "\t",
        $ref_fields[3], "\n";

    for my $j ( $i + 1 .. $ARGV[3] - 1 ) {

        $i = $j;

        # @curr_fields is initialized here

        my @curr_fields = split /\s+/, $lines[$j];

        if ( $ref_fields[0] eq $curr_fields[0] && $ref_fields[2] > $curr_fields[1] ) {

            if ( defined( $curr_fields[0] ) && $curr_fields[0] !~ /\s+/ ) {

                chomp $curr_fields[3];

                # the line below is the one that is printing to standard output
                print
                    $curr_fields[0], "\t",
                    $curr_fields[1], "\t",
                    $curr_fields[2], "\t",
                    $curr_fields[3], "\n";
            }
        }
        else {
            last;
        }
    }

    print "\n";
}

.

.

Edit:

I notice an error when running the script from the answer posted When I run the command

./script.pl temp1 temp10.txt folder

Where temp1 contains

12  58100000    96200000    0.04348
3   74200000    87200000    0.04348
5   130600000   168500000   0.04348
6   61000000    114600000   0.04348
6   75900000    114600000   0.04348
6   88000000    114600000   0.04348
6   88000000    139000000   0.04348
6   93100000    161000000   0.04348
6   105500000   139000000   0.04348
6   130300000   139000000   0.04348
7   59900000    77500000    0.04348
7   98000000    132600000   0.04348
X   67800000    76000000    0.08696
Y   28800000    59373566    0.04348

I get

6   75900000    114600000   0.04348
6   88000000    114600000   0.04348
6   88000000    139000000   0.04348
6   93100000    161000000   0.04348
6   105500000   139000000   0.04348

And temp10.txt contains

12  58100000    96200000    0.04348
3   74200000    87200000    0.04348
5   130600000   168500000   0.04348
6   61000000    114600000   0.04348
6   130300000   139000000   0.04348
7   59900000    77500000    0.04348
7   98000000    132600000   0.04348
X   67800000    76000000    0.08696

The line

Y   28800000    59373566    0.04348

Is neither in the output or temp10.txt. It seems to have dissappeared but should have printed to one of these

Upvotes: 0

Views: 710

Answers (1)

Borodin
Borodin

Reputation: 126722

It seems obvious that the blank lines are printing because you have a line

print "\n";

in your code

I can't help much more because you say "The details of what the script is doing isn't important", and so withhold from us what it's meant to be doing

However, what you have written prints lines from the input file as long as the first column matches the first column in the previous line and the second field is less than the third field in the previous line. Any time you get a line that doesn't qualify in this way you are printing a blank line



You may prefer this refactoring of your code, which behaves identically but I think is much more readable. It also has the advantage of splitting each of the lines from the input file only once, and there is no need for the fourth parameter as the number of lines is simply the size of the @lines array. Blank lines are removed from the file as they are read, so there's no longer a need for your check on the definedness of the first field

#!/usr/bin/perl

# ARGV[0] is the name of the file which data will be read from (may have overlaps)
# ARGV[1] is the name of the file which will be produced that will have no overlaps
# ARGV[2] is the name of the folder which will hold all the circos data file (mitelmanAll, mitelmanProstate, etc.)

use strict;
use warnings 'all';

use File::Path 'make_path';
use File::Spec::Functions 'catfile';

my ($file, $newfile, $dir) = @ARGV;
$newfile = catfile($dir, $newfile);

my @lines = do {
    open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
    map { [ split ] } grep /\S/, <$fh>;
};

make_path($dir);
open my $out_fh, '>', $newfile or die qq{Unable to open "$newfile" for output: $!};

for ( my $i = 0; $i < $#lines; ) {

    my $ref_fields = $lines[$i];

    print $out_fh join("\t", @$ref_fields[0..3]), "\n";

    for my $j ( $i + 1 .. $#lines ) {

        $i = $j;

        my $curr_fields = $lines[$j];

        last unless $curr_fields->[0] == $ref_fields->[0];
        last unless $curr_fields->[1] <  $ref_fields->[2];

        print join("\t", @$curr_fields[0..3]), "\n";
    }
}

Upvotes: 2

Related Questions