Strider
Strider

Reputation: 3759

Performance problem when replacing an innerHTML text using regular expression

I am facing a performance issue when calling a method that replaces an innerHTML text using a regular expression:

function getReplacedText(textToReplace) {
  return textToReplace.replace(/\<img src=[\"|\']([\S\s]+\\)*([\S\s]+).png[\"|\']\/\>/i,"*$2*");
}

The objective behind this replacement, is to retrieve the innerHTML of a contentEditable div in a keyup handler function, and replace each img tag with the name of the file. This replacement is necessary in my case to know if the replaced text exceeds or not the max length allowed to the editable div.

function keyupHandler(event) {
  var myEditableDiv = document.getElementById("editableDiv");
  const currentText = getReplacedText(myEditableDiv.innerHTML);
  if (currentText.length >= 750) { //750 is the max length
    event.preventDefault();
  }
}

For example, the wanted output for abc <img src="assets\test\1F619.png"> def would be abc *1F619* def

When I don't use the getReplacedText I don't have any performance problem. Could you please advise me of a better approach or a better use of the regular expression?

This is an example of the text to replace when performance begins to degrades:

dsd<img src="assets\test\1F619.png"/><img src="assets\test\1F619.png"/><img src="assets\test\1F629.png"/><img src="assets\test\1F630.png"/>sdfsdfsdffsdf<img src="assets\test\1F629.png"/>sdfsdsdfsdf<img src="assets\test\1F627.png"/><img src="assets\test\1F631.png"/>sdfsdfsdf<img src="assets\test\1F631.png"/>sdfsdfsdf<img src="assets\test\1F632.png"/>sdfsdfs<img src="assets\test\1F629.png"/><img src="assets\test\1F629.png"/>sdfs<img src="assets\test\1F631.png"/>df<img src="assets\test\1F632.png"/>sdfsdfsdf

Upvotes: 0

Views: 153

Answers (3)

user557597
user557597

Reputation:

You don't need a DOM to parse html tags !!!

The fastest way to do it, and won't choke on possibly malformed html.

Find

/<img(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\ssrc\s*=\s*(?:(['"])(?:(?!\1)[\S\s])*?((?:(?!\1|\\)[\S\s])*?)\.png\s*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]?)+>/

Replace *$2*

https://regex101.com/r/bCYXV1/1

Explained

                        # Begin 'img' tag
 < img
 (?= \s )
 (?=                    # Asserttion (a pseudo atomic group)
      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
      \s src \s* = \s*       # src attribute
      (?:
           ( ['"] )               # (1), Quote

           (?:
                (?! \1 )
                [\S\s] 
           )*?
           (                      # (2 start)
                (?:
                     (?! \1 | \\ )
                     [\S\s] 
                )*?
           )                      # (2 end)
           \.png                  # find the 'png' file
           \s* 
           \1          
      )
 )
                        # Have the png file, just match the rest of tag
 \s+ 
 (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]? )+

 >                      # End img tag

var input = "dsd<img src=\"assets\\test\\1F619.png\"><img src=\"assets\\test\\1F619.png\"><img src=\"assets\\test\\1F629.png\"><img src=\"assets\\test\\1F630.png\">sdfsdfsdffsdf<img src=\"assets\\test\\1F629.png\">sdfsdsdfsdf<img src=\"assets\\test\\1F627.png\"><img src=\"assets\\test\\1F631.png\">sdfsdfsdf<img src=\"assets\\test\\1F631.png\">sdfsdfsdf<img src=\"assets\\test\\1F632.png\">sdfsdfs<img src=\"assets\\test\\1F629.png\"><img src=\"assets\\test\\1F629.png\">sdfs<img src=\"assets\\test\\1F631.png\">df<img src=\"assets\\test\\1F632.png\">sdfsdfsdf";
console.log(input.replace(/<img(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\ssrc\s*=\s*(?:(['"])(?:(?!\1)[\S\s])*?((?:(?!\1|\\)[\S\s])*?)\.png\s*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]?)+>/g 
,"\n*$2*"));

Upvotes: 1

CertainPerformance
CertainPerformance

Reputation: 371138

Avoid using regular expressions to parse HTML. Use DOMParser instead - find <img> tags, and replace them a text node containing only the last part of the src:

const input = String.raw`dsd<img src="assets\test\1F619.png"><img src="assets\test\1F619.png"><img src="assets\test\1F629.png"><img src="assets\test\1F630.png">sdfsdfsdffsdf<img src="assets\test\1F629.png">sdfsdsdfsdf<img src="assets\test\1F627.png"><img src="assets\test\1F631.png">sdfsdfsdf<img src="assets\test\1F631.png">sdfsdfsdf<img src="assets\test\1F632.png">sdfsdfs<img src="assets\test\1F629.png"><img src="assets\test\1F629.png">sdfs<img src="assets\test\1F631.png">df<img src="assets\test\1F632.png">sdfsdfsdf`;
const doc = new DOMParser().parseFromString(input, 'text/html');
doc.querySelectorAll('img[src]').forEach((img) => {
  img.replaceWith(' ' + img.src.match(/[^\/]+(?=\.png$)/)[0] + ' ');
});
console.log(doc.body.innerHTML);

Upvotes: 2

Emma
Emma

Reputation: 27743

My guess is that maybe this simple expression on s mode might simply do the job here:

 <img src=["']\s*(\S+.png)\s*["']\s*>

or if we are not capturing the image,

<img src=["']\s*\S+.png\s*["']\s*>

would be enough.

DEMO

Upvotes: 0

Related Questions