Reputation: 2543
I have a string of html that I want to deploy without the <img />
What I have currently is:
var myHTML = "<p><img class="alignnone size-full wp-image-2857"
src="https://files.wordpress.com/2016/05/laptop.jpg?w=750&h=545"
alt="https://pixabay.com/en/laptop-printer-office-folder-graph-1016257/"
width="750" height="545" /></p> <p>STUFF</p> <p>MORE STUFF</p> <p>EVEN MORE
STUFF</p> <p><strong><span style="text-decoration:underline;">OTHER
STUFF</span></strong></p> <p><em>OTHER STUFF</em>: DEMO STUFF</p> <p>
<em>TEST STUFF</em>: WRAP UP STUFF</p> <p><strong><span style="text-
decoration:underline;">REST OF STUFF</span></strong></p> <p><em>Aloha
POS</em>: KEEP THIS STUFF TOO</p> <p><em>Revel</em>: WHAT STUFF</p> <p>
DONE</p> "
What I think it should look like:
var myHTML2 = "<p></p> <p>STUFF</p> <p>MORE STUFF</p> <p>EVEN MORE
STUFF</p> <p><strong><span style="text-decoration:underline;">OTHER
STUFF</span></strong></p> <p><em>OTHER STUFF</em>: DEMO STUFF</p> <p>
<em>TEST STUFF</em>: WRAP UP STUFF</p> <p><strong><span style="text-
decoration:underline;">REST OF STUFF</span></strong></p> <p><em>Aloha
POS</em>: KEEP THIS STUFF TOO</p> <p><em>Revel</em>: WHAT STUFF</p> <p>
DONE</p> "
What I tried:
myHTML.replace(/<(?!\s*\/?\s*p\b)[^>]*>/gi,'')
But this strips all of the html from the string and I only want to remove the <img />
tag.
Upvotes: 0
Views: 58
Reputation: 14990
It's not advisable to use a regex to parse HTML due to all the possible obscure edge cases that can crop up, but it seems that you have some control over the HTML so you should able to avoid many of the edge cases the regex police cry about.
<img\s(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>
Replace with:
nothing
This regex will do the following:
Live demo https://regex101.com/r/pG1oI7/1
Sample String
<p><img class="alignnone size-full wp-image-2857"
src="https://files.wordpress.com/2016/05/laptop.jpg?w=750&h=545"
alt="https://pixabay.com/en/laptop-printer-office-folder-graph-1016257/"
width="750" height="545" /></p> <p>STUFF</p> <p>MORE STUFF</p> <p>EVEN MORE
STUFF</p> <p><strong><span style="text-decoration:underline;">OTHER
STUFF</span></strong></p> <p><em>OTHER STUFF</em>: DEMO STUFF</p> <p>
<em>TEST STUFF</em>: WRAP UP STUFF</p> <p><strong><span style="text-
decoration:underline;">REST OF STUFF</span></strong></p> <p><em>Aloha
POS</em>: KEEP THIS STUFF TOO</p> <p><em>Revel</em>: WHAT STUFF</p> <p>
DONE</p>
After Replacement
<p></p> <p>STUFF</p> <p>MORE STUFF</p> <p>EVEN MORE
STUFF</p> <p><strong><span style="text-decoration:underline;">OTHER
STUFF</span></strong></p> <p><em>OTHER STUFF</em>: DEMO STUFF</p> <p>
<em>TEST STUFF</em>: WRAP UP STUFF</p> <p><strong><span style="text-
decoration:underline;">REST OF STUFF</span></strong></p> <p><em>Aloha
POS</em>: KEEP THIS STUFF TOO</p> <p><em>Revel</em>: WHAT STUFF</p> <p>
DONE</p>
NODE EXPLANATION
----------------------------------------------------------------------
<img '<img'
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
[^>=] any character except: '>', '='
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=' '=\''
----------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
=" '="'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
= '='
----------------------------------------------------------------------
[^'"] any character except: ''', '"'
----------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
> '>'
Upvotes: 1
Reputation: 11
This is not a regex answer but if you are already using javascript you can use what javascript was designed for and manipulate the DOM directly like this
var html = '<p><img class="alignnone size-full wp-image-2857" src="https://files.wordpress.com/2016/05/laptop.jpg?w=750&h=545" alt="https://pixabay.com/en/laptop-printer-office-folder-graph-1016257/" width=\"750" height="545" /></p> <p>STUFF</p> <p>MORE STUFF</p> <p>EVEN MORE STUFF</p> <p><strong><span style="text-decoration:underline;">OTHER STUFF</span></strong></p> <p><em>OTHER STUFF</em>: DEMO STUFF</p> <p><em>TEST STUFF</em>: WRAP UP STUFF</p> <p><strong><span style="text-decoration:underline;">REST OF STUFF</span></strong></p> <p><em>Aloha POS</em>: KEEP THIS STUFF TOO</p> <p><em>Revel</em>: WHAT STUFF</p> <p>DONE</p>';
var el = document.createElement('div');
el.innerHTML = html;
var p = el.getElementsByTagName('p')[0]; // the first one where the image is
var img = p.getElementsByTagName('img')[0]; // there is only one might want to use id or class to be more specific
console.log(img);
p.removeChild(img); //have to remove from the first ancestor or parent
You will want to use classes or id's if you are going to have lots of images.
Upvotes: 1
Reputation: 6173
You could use this regex to remove the img tag:
<img[^>]+>
I don't know what you were trying to do with the regex you had, honestly. It doesn't need to be complicated, the only "regex construct" that I had to use was [^>]+
, which just matches characters that aren't >
.
The benefit of using a simple regex is readability and speed. Of course, if you wanted to account for edge cases, (such as false positives in embedded JS), you should use a HTML parser.
Upvotes: 1