Reputation: 788
I found a few reference to people having a similar issue where the answer always was, make sure you call window.close() when done. However that does not seem to be working for me (node 0.8.14 and jsdom 0.3.1)
A simple repro
var util = require('util');
var jsdom=require('jsdom');
function doOne() {
var htmlDoc = '<html><head></head><body id="' + i + '"></body></html>';
jsdom.env(htmlDoc, null, null, function(errors, window) {
window.close();
});
}
for (var i=1;i< 100000;i++ ) {
doOne();
if(i % 500 == 0) {
console.log(i + ":" + util.inspect(process.memoryUsage()));
}
}
console.log ("done");
Output I get is
500:{ rss: 108847104, heapTotal: 115979520, heapUsed: 102696768 }
1000:{ rss: 198250496, heapTotal: 194394624, heapUsed: 190892120 }
1500:{ rss: 267304960, heapTotal: 254246912, heapUsed: 223847712 }
...
11000:{ rss: 1565204480, heapTotal: 1593723904, heapUsed: 1466889432 }
At this point the fan goes wild and the test actually stops...or at leasts starts going very slowly
Does anyone have any other tips than window.close to get rid of the memory leak (or it sure looks like a memory leak)
Thanks!
Peter
Upvotes: 18
Views: 5088
Reputation: 8300
JSDom defers certain operations using process.nextTick
.
When calling JSDom in a synchronous loop, the event loop never gets to run. Thus references to old JSDom window objects are retained and node runs out of memory.
By running some asynchronous code, the event loop will be allowed to run, and it should solve memory leak.
You can simply use setTimeout
with zero delay to allow the event loop to run.
await new Promise(resolve => setTimeout(resolve, 0));
Upvotes: 1
Reputation: 130
A work around for this is to run the jsdom related code in a forked child_process and send back the relevant results when done. then kill the child_process.
Upvotes: 1
Reputation: 5011
Using jsdom 0.6.0 to help scrape some data and ran into the same problem.
window.close
only helped slow the memory leak, but it did eventually creep up till the process got killed.
Running the script with
node --expose-gc myscript.js
Until they fix the memory leak, manually calling the garbage collector in addition to calling window.close
seems to work:
if (process.memoryUsage().heapUsed > 200000000) { // memory use is above 200MB
global.gc();
}
Stuck that after the call to window.close. Memory use immediately drops back to baseline (around 50MB for me) every time it gets triggered. Barely perceptible halt.
update: also consider calling global.gc()
multiple times in succession rather than only once (i.e. global.gc();global.gc();global.gc();global.gc();global.gc();
)
Calling window.gc() multiple times was more effective (based on my imperfect tests), I suspect because it possibly caused chrome to trigger a major GC event rather than a minor one. - https://github.com/cypress-io/cypress/issues/350#issuecomment-688969443
Upvotes: 15
Reputation: 1342
with gulp, memory usage, cleanup, variable delete, window.close()
var gb = setInterval(function () {
//only call if memory use is bove 200MB
if (process.memoryUsage().heapUsed > 200000000) {
global.gc();
}
}, 10000); // 10sec
gulp.task('tester', ['clean:raw2'], function() {
return gulp.src('./raw/*.html')
.pipe(logger())
.pipe(map(function(contents, filename) {
var doc = jsdom.jsdom(contents);
var window = doc.parentWindow;
var $ = jquery(window);
console.log( $('title').text() );
var html = window.document.documentElement.outerHTML;
$( doc ).ready(function() {
console.log( "document loaded" );
window.close();
});
return html;
}))
.pipe(gulp.dest('./raw2'))
.on('end', onEnd);
});
and I had constatly between 200mb - 300mb usage, for 7k files. it took 30 minutes. It might be helpful for someone, as i googled and didnt find anything helpful.
Upvotes: 1
Reputation: 112827
You are not giving the program any idle time to do garbage collection. I believe you will run into the same problem with any large object graph created many times tightly in a loop with no breaks.
This is substantiated by CheapSteaks's answer, which manually forces the garbage collection. There can't be a memory leak in jsdom if that works, since memory leaks by definition prevent the garbage collector from collecting the leaked memory.
Upvotes: 7
Reputation: 754
I had the same problem with jsdom and switcht to cheerio, which is much faster than jsdom and works even after scanning hundreds of sites. Perhaps you should try it, too. Only problem is, that it dosent have all the selectors which you can use in jsdom.
hope it works for you, too.
Daniel
Upvotes: 4