Reputation: 100486
I am confused about the best way to discover the image dimensions, or the naturalWidth of images, given the url to the image, most often found in the src attribute of an <img>
tag.
My goal is take as input a url to a news article and use machine learning to find the top 5 biggest pictures (.jpg, .png, etc) files in the document. The problem with using the front-end to do this, is that I don't know of a way to use AJAX to http GET html from some random page of some random server, because of CORS related issues.
However, using Node.js, or some server technology, I can make requests to get the HTML from other servers (as one would expect) but I don't know a way of getting the image sizes without downloading the images first. The problem is that, I want the downloaded images on the front-end, not the back-end, and therefore downloading images with Node.js is wasted effort, if it's just to check the image dimensions.
Has anyone experienced this exact problem before? Not sure how to proceed. As I said, my goals are to download images on the front-end, and keep the ones that are bigger than say 300px in width.
Upvotes: 1
Views: 548
Reputation: 1554
Both ways are ok, depends greatly on exactly what you need to achieve in terms of performance:
To me seems that, the simplest way for you would be on client side, then you only need a few lines of JavaScript to do it:
var img = new Image();
img.onload = function() {
console.log(this.width + 'x' + this.height);
}
img.src = 'http://www.google.com/intl/en_ALL/images/logo.gif';
On server side is also possible but you will need to install GraphicsMagick or ImageMagick. I'd go with GraphicsMagick as it is faster.
Once you have installed both the program and it's module (npm install gm) you would do something like this to get the width and height.
gm = require('gm');
// obtain the size of an image
gm('test.jpg')
.size(function (err, size) {
if (!err) {
console.log(size.width + 'x' + size.height);
}
});
Also, this other module looks good, I haven't used it but it looks promsing https://github.com/netroy/image-size
To get the img urls from the html string
You can load your html string using a simple http request, then you need to use a regexp capture group to extract the urls, and if you're wanting to match globally g, i.e. more than once, when using capture groups, you need to use exec in a loop (match ignores capture groups when matching globally).
This way you'll have all the sources in an array.
For example:
var m;
var urls = [];
var rex = /<img[^>]+src="?([^"\s]+)"?\s*\/>/g;
// this is you html string
var str = '<img src="http://example.com/one.jpg />\n <img src="http://example.com/two.jpg />';
while ( m = rex.exec( str ) ) {
urls.push( m[1] );
}
console.log( urls );
// [ "http://example.com/one.jpg", "http://example.com/two.jpg" ]
Hope it helps.
Upvotes: 2