Reputation: 5
I would like to extract text from HTML with pure Javascript (this is for a Chrome extension).
Specifically, I would like to be able to find text on a page and extract text after it.
Even more specifically, on a page like
https://picasaweb.google.com/kevin.smilak/BestOfAmericaSGrandCircle#4974033581081755666
I would like to find text "Latitude" and extract the value that goes after it. HTML there is not in a very structured form.
What is an elegant solution to do it?
Upvotes: 0
Views: 9632
Reputation: 497
There is no elegant solution in my opinion because as you said HTML is not structured and the words "Latitude" and "Longitude" depends on page localization. Best I can think of is relying on the cardinal points, which might not change...
var data = document.getElementById("lhid_tray").innerHTML;
var lat = data.match(/((\d)*\.(\d)*)°(\s*)(N|S)/)[1];
var lon = data.match(/((\d)*\.(\d)*)°(\s*)(E|W)/)[1];
Upvotes: 2
Reputation: 2675
Well if a more general answer is required for other sites then you can try something like:
var text = document.body.innerHTML;
text = text.replace(/(<([^>]+)>)/ig,""); //strip out all HTML tags
var latArray = text.match(/Latitude:?\s*[^0-9]*[0-9]*\.?[0-9]*\s*°\s*[NS]/gim);
//search for and return an array of all found results for:
//"latitude", one or 0 ":", white space, A number, white space, 1 or 0 "°", white space, N or S
//(ignores case)(ignores multi-line)(global)
For that example an array of 1 element containing "Latitude: 36.872068° N" is returned (which should be easy to parse).
Upvotes: 0
Reputation: 360046
The text you're interested in is found inside of a div
with class gphoto-exifbox-exif-field
. Since this is for a Chrome extension, we have document.querySelectorAll
which makes selecting that element easy:
var div = document.querySelectorAll('div.gphoto-exifbox-exif-field')[4],
text = div.innerText;
/* text looks like:
"Filename: img_3474.jpg
Camera: Canon
Model: Canon EOS DIGITAL REBEL
ISO: 800
Exposure: 1/60 sec
Aperture: 5.0
Focal Length: 18mm
Flash Used: No
Latitude: 36.872068° N
Longitude: 111.387291° W"
*/
It's easy to get what you want now:
var lng = text.split('Longitude:')[1].trim(); // "111.387291° W"
I used trim()
instead of split('Longitude: ')
since that's not actually a space character in the innerText
(URL-encoded, it's %C2%A0
...no time to figure out what that maps to, sorry).
Upvotes: 1
Reputation: 44078
I would query the DOM and just collect the image information into an object, so you can reference any property you want.
E.g.
function getImageData() {
var props = {};
Array.prototype.forEach.apply(
document.querySelectorAll('.gphoto-exifbox-exif-field > em'),
[function (prop) {
props[prop.previousSibling.nodeValue.replace(/[\s:]+/g, '')] = prop.textContent;
}]
);
return props;
}
var data = getImageData();
console.log(data.Latitude); // 36.872068° N
Upvotes: 0
Reputation: 178422
you could do
var str = document.getElementsByClassName("gphoto-exifbox-exif-field")[4].innerHTML;
var latPos = str.indexOf('Latitude')
lat = str.substring(str.indexOf('<em>',latPos)+4,str.indexOf('</em>',latPos))
Upvotes: 1