Regex to find price in HTML

Question

Disclaimer: I know that parsing HTML with regex is not the correct approach. I am actually just trying to parse text inside the HTML.

I am parsing several pages, and I am looking for prices. Here is what I have so far:

var all = document.body.querySelectorAll(":not(script)");
var regex = /\$[0-9,]+(\.[0-9]{2})?/g;

for (var i = 0; i < all.length; i++) {

    var node_value = all[i].nodeValue;
        for (var j = 0; j < all[i].childNodes.length; j++) {

            var node_value = all[i].childNodes[j].nodeValue;
            if (node_value !== null) {

                var matches = node_value.match(regex);
                if (matches !== null && matches.length > 0) {

                    alert("that's a match");
                }
            }
        }
}

This particular code can get me prices like this:

This is the current price: $60.00

However, there are some prices that have the following structure:

This is the current price: ^$80.00

How could I improve the algorithm in order to find those prices? Shall I look in the first for loop for ^symbolprice with regex?

Important: Once a match, I need to findout which DOM element is holding that price. The most inner element that is holding the price. So for example:

$80.00

I would need to say that is the element that is holding the price, not the div.

Niet the Dark Absol · Accepted Answer

Try this:

var text = document.body.textContent || document.body.innerText,
    regex = /\$\s*[0-9,]+(?:\s*\.\s*\d{2})?/g,
    match = text.match(regex);
if( match) {
    match = match[0].replace(/\s/g,"");
    alert("Match found: "+match);
}

Using a recursive search:

function findPrice(node) {
    node = node || document.body;
    var text = node.textContent || node.innerText,
        regex = /\$\s*[0-9,]+(?:\s*\.\s*\d{2})?/,
        match = text.match(regex);
    if( match) {
        var children = node.children, l = children.length, i;
        for( i=0; i

Regex to find price in HTML

Answers (2)

Related Questions