Erfan
Erfan

Reputation: 1192

How to get numbers in elements' inner text by javascript's regex

I want to get numbers in the inner text of an html by javascript regex to replace them.
for example in the below code I want to get 1,2,3,4,5,6,1,2,3,1,2,3, but not the 444 inside of the div tag.

<body>
  aaaa123aaa456
  <div style="background: #444">aaaa123aaaa</div>
  aaaa123aaa
</body>

What could be the regular expression?

Upvotes: 0

Views: 2837

Answers (3)

Erfan
Erfan

Reputation: 1192

Just to answer my old question:
It is possible to achieve it by lookahead.

/\d(?=[^<>]*(<|$))/g

to replace the numbers

    html.replace(/\d(?=[^<>]*(<|$))/g, function($0) {
        return map[$0]
    });

the source of the answer https://www.drupal.org/node/619198#comment-5710052

Upvotes: 0

Mike Samuel
Mike Samuel

Reputation: 120546

Your best bet is to use innerText or textContent to get at the text without the tags and then just use the regex /\d/g to get the numbers.

function digitsInText(rootDomNode) {
  var text = rootDomNode.textContent || rootDomNode.innerText;
  return text.match(/\d/g) || [];
}

For example,

alert(digitsInText(document.body));

If your HTML is not in the DOM, you can try to strip the tags yourself : JavaScript: How to strip HTML tags from string?


Since you need to do a replacement, I would still try to walk the DOM and operate on text nodes individually, but if that is out of the question, try

var HTML_TOKEN = /(?:[^<\d]|<(?!\/?[a-z]|!--))+|<!--[\s\S]*?-->|<\/?[a-z](?:[^">']|"[^"]*"|'[^']*')*>|(\d+)/gi;

function incrementAllNumbersInHtmlTextNodes(html) {
  return html.replace(HTML_TOKEN, function (all, digits) {
    if ("string" === typeof digits) {
      return "" + (+digits + 1);
    }
    return all; 
  });
}

then

incrementAllNumbersInHtmlTextNodes(
    '<b>123</b>Hello, World!<p>I <3 Ponies</p><div id=123>245</div>')

produces

    '<b>124</b>Hello, World!<p>I <4 Ponies</p><div id=123>246</div>'

It will get confused around where special elements like <script> end and won't recognize digits that are entity encoded, but should work otherwise.

Upvotes: 4

Barney
Barney

Reputation: 16466

You don't necessarily need RegExp to get the text contents of an element excluding its descendant elements' — in fact I'd advise against it as RegExp matching for HTML is notoriously difficult — there are DOM solutions:

function getImmediateText(element){
    var text = '';

    // Text and elements are all DOM nodes. We can grab the lot of immediate descendants and cycle through them.
    for(var i = 0, l = element.childNodes.length, node; i < l, node = element.childNodes[i]; ++i){
    // nodeType 3 is text
        if(node.nodeType === 3){
            text += node.nodeValue;
        }
    }

    return text;
}

var bodyText = getImmediateText(document.getElementsByTagName('body')[0]);

So here there's a function that will return only the immediate text content as a string. Of course, you could then strip that for numbers with the RegExp using something like this:

var numberString = bodyText.match(/\d+/g).join('');

Upvotes: 0

Related Questions