Peter
Peter

Reputation: 788

jsdom and node.js leaking memory

I found a few reference to people having a similar issue where the answer always was, make sure you call window.close() when done. However that does not seem to be working for me (node 0.8.14 and jsdom 0.3.1)

A simple repro

var util = require('util');
var jsdom=require('jsdom');

function doOne() {
  var htmlDoc = '<html><head></head><body id="' + i + '"></body></html>';
  jsdom.env(htmlDoc, null, null, function(errors, window) {
    window.close();
  });
}

for (var i=1;i< 100000;i++ )  {
  doOne();
  if(i % 500 == 0)  {
    console.log(i + ":" + util.inspect(process.memoryUsage()));
  }
}
console.log ("done");

Output I get is

500:{ rss: 108847104, heapTotal: 115979520, heapUsed: 102696768 }
1000:{ rss: 198250496, heapTotal: 194394624, heapUsed: 190892120 }
1500:{ rss: 267304960, heapTotal: 254246912, heapUsed: 223847712 }
...
11000:{ rss: 1565204480, heapTotal: 1593723904, heapUsed: 1466889432 }

At this point the fan goes wild and the test actually stops...or at leasts starts going very slowly

Does anyone have any other tips than window.close to get rid of the memory leak (or it sure looks like a memory leak)

Thanks!

Peter

Upvotes: 18

Views: 5088

Answers (6)

Magnus
Magnus

Reputation: 8300

JSDom defers certain operations using process.nextTick.

When calling JSDom in a synchronous loop, the event loop never gets to run. Thus references to old JSDom window objects are retained and node runs out of memory.

By running some asynchronous code, the event loop will be allowed to run, and it should solve memory leak.

You can simply use setTimeout with zero delay to allow the event loop to run.

await new Promise(resolve => setTimeout(resolve, 0));

Upvotes: 1

kyle belle
kyle belle

Reputation: 130

A work around for this is to run the jsdom related code in a forked child_process and send back the relevant results when done. then kill the child_process.

Upvotes: 1

CheapSteaks
CheapSteaks

Reputation: 5011

Using jsdom 0.6.0 to help scrape some data and ran into the same problem.
window.close only helped slow the memory leak, but it did eventually creep up till the process got killed.

Running the script with node --expose-gc myscript.js

Until they fix the memory leak, manually calling the garbage collector in addition to calling window.close seems to work:

if (process.memoryUsage().heapUsed > 200000000) { // memory use is above 200MB
    global.gc();
}

Stuck that after the call to window.close. Memory use immediately drops back to baseline (around 50MB for me) every time it gets triggered. Barely perceptible halt.

update: also consider calling global.gc() multiple times in succession rather than only once (i.e. global.gc();global.gc();global.gc();global.gc();global.gc();)

Calling window.gc() multiple times was more effective (based on my imperfect tests), I suspect because it possibly caused chrome to trigger a major GC event rather than a minor one. - https://github.com/cypress-io/cypress/issues/350#issuecomment-688969443

Upvotes: 15

Hontoni
Hontoni

Reputation: 1342

with gulp, memory usage, cleanup, variable delete, window.close()

var gb = setInterval(function () {

    //only call if memory use is bove 200MB
    if (process.memoryUsage().heapUsed > 200000000) { 
        global.gc();
    }

}, 10000); // 10sec


gulp.task('tester', ['clean:raw2'], function() {

  return gulp.src('./raw/*.html')
    .pipe(logger())
    .pipe(map(function(contents, filename) {


        var doc = jsdom.jsdom(contents);
        var window = doc.parentWindow;
        var $ = jquery(window);

        console.log( $('title').text() );

        var html = window.document.documentElement.outerHTML;

        $( doc ).ready(function() {
            console.log( "document loaded" );
            window.close();
        });

        return html;
    }))
    .pipe(gulp.dest('./raw2'))
    .on('end', onEnd);
});

and I had constatly between 200mb - 300mb usage, for 7k files. it took 30 minutes. It might be helpful for someone, as i googled and didnt find anything helpful.

Upvotes: 1

Domenic
Domenic

Reputation: 112827

You are not giving the program any idle time to do garbage collection. I believe you will run into the same problem with any large object graph created many times tightly in a loop with no breaks.

This is substantiated by CheapSteaks's answer, which manually forces the garbage collection. There can't be a memory leak in jsdom if that works, since memory leaks by definition prevent the garbage collector from collecting the leaked memory.

Upvotes: 7

BeMoreDifferent.com
BeMoreDifferent.com

Reputation: 754

I had the same problem with jsdom and switcht to cheerio, which is much faster than jsdom and works even after scanning hundreds of sites. Perhaps you should try it, too. Only problem is, that it dosent have all the selectors which you can use in jsdom.

hope it works for you, too.

Daniel

Upvotes: 4

Related Questions