Reputation: 21
I don't have experience in regex. I am just trying to find a way to detect and delete every character outside of the img tag. In other words I want to strip a given html code from all text and tags and just keep everything within the img tags. The result should show just the image tags like that:
<img src="sourcehere">
Is there a way to do this?
UPDATE: I need specifically a regex that goes in preg_replace. This is what I have done, but it doesn't work:
$buffer ="<html><head></head><body><img src='image.jpg'></body></html>";
$buffer = preg_replace('(?i)<(?!img|/img).*?>', '', $buffer);
echo $buffer; /* should output <img src='image.jpg'> but it doesn't */
Upvotes: 2
Views: 280
Reputation: 18611
Use
preg_replace('/<img[^>]*>(*SKIP)(*FAIL)|./si', '', $buffer)
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
<img '<img'
--------------------------------------------------------------------------------
[^>]* any character except: '>' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
(*SKIP)(*FAIL) skips the match
--------------------------------------------------------------------------------
| or
--------------------------------------------------------------------------------
. any character
Upvotes: 0
Reputation: 589
This doesn't need to be some big and fancy regex.
<img[^>]*>
This matches the text "" followed by the closer ">".
Once you have the matches you would just want to write out the matches to a string, or to the document, or however you want to represent them.
EDIT:
To complete what the OP is showing in PHP, you would want to call match instead of replace. You don't really need to replace all of the non-matching sections. You can just keep the results:
$buffer ="<html><head></head><body><img src='image.jpg'></body></html>";
preg_match("/<img[^>]*>/", $buffer, $matchArray);
foreach ($matchArray as $match){
echo $match;
}
prints out:
<img src='image.jpg'>
EDIT:
The problem I am seeing with trying to replace every other tag will be when you have contents between the tags. If you don't care about that, then here is something that works using preg_replace().
$buffer ="<html><head></head><body><img src='image.jpg'></body></html>";
$buffer = preg_replace('/(?i)<\\/*(?!img)[^>]*>/', '', $buffer);
echo $buffer; /* outputs <img src='image.jpg'> */
Upvotes: 0
Reputation: 11807
What are your plans -- do you want to log it to a file or just display in a console, or output it in some way. This worked for me, but actually 'stringing' it out might take extra work.
this is jQuery. From my understanding you want to remove everything but the images from your document.
var arr2 = Array.prototype.slice.call( document.images );
jQuery('body').contents().remove();
for(i = 0; i < arr2.length;i++){
jQuery('body').append(arr2[i])
}
Upvotes: 0