gapvision
gapvision

Reputation: 1029

Memory leak in program that uses LWP::UserAgent to download a file

I am trying to revive a Perl script I was using a long time ago. It is for downloading files from cloud storage to my local client. I'm pretty sure it worked fine back then, but now I am having the issue that LWP::UserAgent downloads the file entirely into the memory before writing it to disk. Expected and former behaviour would be that it should write chunks of the received file to the target during download.

I'm am trying it currently on OSX with Perl 5.16.3 and 5.18 and also tried it on Windows but I do not know the Perl version any more. I am pretty confident that this is related to the Perl version, but I do not know which I used back then and I want to know what changed.

sub downloadFile {

    my $url           = shift;
    my $filename      = shift;
    my $temp_filename = shift;
    my $expected_size = shift;

    (   $download_size, $received_size, $avg_speed,   $avg_speed_s, $avg_speed_q,
        $speed_count,   $speed,         $byte_offset, $http_status
    ) = ( 0, 0, 0, 0, 0, 0, 0, 0, 0 );

    if ( -e $temp_filename and !$options{'no-resume'} ) {

        my @stat = stat($temp_filename);

        if ( $expected_size > $stat[7] ) {
            $byte_offset   = $stat[7];
            $received_size = $stat[7];
        }
    }

    open DOWNLOAD, ( $byte_offset > 0 ) ? ">>" : ">", $temp_filename
            or die "Unable to create download file: $!";
    binmode DOWNLOAD;

    $last_tick = time();

    my $host = "myhost";

    if ( $url =~ m/http:\/\/(.*?)\//gi ) {
        $host = $1;
    }

    $agent->credentials(
            $host . ":80",
            "Login Required",
            $config->{"account_name"},
            $config->{"account_password"} );

    my $response = $agent->get(
            $url,
            ':content_cb'     => \&didReceiveData,
            ':read_size_hint' => ( 2**14 ) );

    close DOWNLOAD;

    my @stat        = stat($temp_filename);
    my $actual_size = $stat[7];

    if ( ! $response->is_success() ) {

        printfvc( 0,
                "\rDownload failed: %s",
                'red',
                $response->status_line() );

        return 0;
    }
    elsif ( $actual_size != $expected_size ) {

        printfvc( 0,
                "\rDownloaded file does not have expected size (%s vs. %s)",
                'red',
                $actual_size, $expected_size );

        return 0;
    }
    else {

        rename $temp_filename, $filename;

        printfvc( 0,
                "\rDownload succeeded                                                           ",
                'green' );

        return 1;
    }
}

sub didReceiveData {

    my ( $data, $cb_response, $protocol ) = @_;

    #my($response, $ua, $h, $data) = @_;
    my $data_size = scalar( length($data) );
    $received_size += $data_size;
    $speed_count   += $data_size;

    my $now = time();

    if ( $last_tick < $now ) {
        $speed       = $speed_count;
        $speed_count = 0;
        $last_tick   = $now;
        $avg_speed_q++;
        $avg_speed_s += $speed;
        $avg_speed = $avg_speed_s / $avg_speed_q;
    }

    if ( $download_size > 0 and $http_status eq "200" or $http_status eq "206" ) {

        print DOWNLOAD $data;

        printf("-> %.1f %% (%s of %s, %s/s) %s      ",
                ( $received_size / $download_size ) * 100,
                fsize($received_size),
                fsize($download_size),
                fsize($speed),
                $avg_speed_q > 3
                ? fduration( ( $download_size - $received_size ) / $avg_speed ) . " remaining"
                : ""
        ) if ( $verbosity >= 0 );
    }
    else {
        printf("-> Initiating transfer...") if ( $verbosity >= 0 );
    }

    return 1;
}

output:

mun-m-sele:PutIO-Perl-folder-sync sele$ perl putiosync.pl 
Syncing folder 'Test' to '/Users/sele/Downloads/Test'...
1 files queued to download
5MB.zip
Fetching '5MB.zip' [1 of 1]

-> 0.3 % (16.0 kiB of 5.0 MiB, 16.0 kiB/s)       
-> 0.6 % (32.0 kiB of 5.0 MiB, 16.0 kiB/s)       
-> 0.9 % (48.0 kiB of 5.0 MiB, 16.0 kiB/s)       
 .
 . 
 .      
-> 99.1 % (5.0 MiB of 5.0 MiB, 16.0 kiB/s)       
-> 99.4 % (5.0 MiB of 5.0 MiB, 16.0 kiB/s)       
-> 99.7 % (5.0 MiB of 5.0 MiB, 16.0 kiB/s)       
Download succeeded

So output is as expected BUT still this output only appears after the file has been loaded into memory.

The content_cb is not called during the download (tested by simply putting a print("cb") to the top of didReceiveData

update

I found out that it works as expected on Windows Strawberry Perl 5.16.2. I can provide you with package versions if you tell me which and how ;)

Upvotes: 1

Views: 632

Answers (3)

Borodin
Borodin

Reputation: 126722

Your own code contains a lot of irrelevances, like resume support, multiple server support, progress logging, site credentials, temporary download files, error handling, and average speed calculations. None of these are relevant to the core problem that you described, and that is why I asked you to create a Minimal, Complete, and Verifiable example. I don't understand your refusal, or why you seem to be clinging to the idea that the error is in Perl and not in your own code

Without that, all I can do is demonstrate that the technique works well. Here is the sort of thing that you should have generated as a demonstration of the problem. It is very little different from your own code, and it works fine. It downloads an official ISO image of the Ubuntu desktop distribution which is about 1.4GB of information. The process uses a steady 17MB of memory and finishes in 14 minutes. The size of the resultant file exactly matches the Content-Length specified in the HTTP header

Beyond this no one can help you further. I encourage you to accept the help of experts when you have asked for it. It's also worth noting that the problem will often be revealed by the process of creating an MCVE from your faulty program: you are very likely to delete a non-essential part of the code and find that the issue has disappeared

use strict;
use warnings 'all';

use LWP;

use constant ISO_URL => 'http://releases.ubuntu.com/16.04/ubuntu-16.04-desktop-amd64.iso';

STDOUT->autoflush;

my $ua = LWP::UserAgent->new;

my $expected;
{
    my $res = $ua->head(ISO_URL);
    $expected = $res->header('Content-Length');
    printf "Expected file size is %.3fMB\n",  $expected / 1024**2;
}

my ($iso_file) = ISO_URL =~ m{([^/]+)\z};
open my $iso_fh, '>:raw', $iso_file or die $!;
my $total;
my $pc = 0;

{
    my $res = $ua->get(
        ISO_URL,
        ':content_cb'     => \&content_cb,
        ':read_size_hint' => 16 * 1024,
    );

    close $iso_fh or die $!;

    print $res->status_line, "\n";
    printf "Final file size is %.3fMB\n", (-s $iso_file) / 1024**2;
}

sub content_cb {

    my ( $data, $res ) = @_;

    die $res->status_line unless $res->is_success;

    print $iso_fh $data;

    $total += length $data;
    while ( $pc < 100 * $total / $expected ) {
        printf "%3d%%\n", $pc++;
    }
}

output

Expected file size is 1417.047MB
  0%
  1%
  2%
  3%
  4%
  5%
  :
  :
 95%
 96%
 97%
 98%
 99%
200 OK
Final file size is 1417.047MB

Upvotes: 2

Borodin
Borodin

Reputation: 126722

A major problem with your code is that $http_status is never assigned. It can be set only by the callback didReceiveData or after the entire download has completed when the get call exits

But your callback tests whether $http_status eq "200" (which should be $cb_response->is_success) before printing to the DOWNLOAD file handle, so nothing can ever be written

I can believe that your code escalates memory because it endlessly prints -> Initiating transfer... to STDOUT, but nothing will ever be written to the temporary file because of the untested HTTP status. I am certain that you watched your process run and die with an Out of memory error and instantly blamed Perl without even trying to download a 1KB file. Your code has never worked, and your question and support of those who would help you is outrageous

"I'm pretty sure it worked fine back then" isn't a great start, but when you then reject on that basis all applications of a solution or a request for information then you are being ridiculous

Upvotes: 0

J Singh
J Singh

Reputation: 162

Could it be that the problem is with the file I/O rather than LWP? I assume that data is not being flushed to the file till you close the file.

Below is an example code on how to make File handle flush data to hard disk:

{ my $ofh = select LOG;
  $| = 1;
  select $ofh;
}

Check out perldoc -q flush and this interesting article on buffering, "Suffering from Buffering?".

Upvotes: 1

Related Questions