Reputation: 11
I open an HTML page in Notepad++.
The html page has a lot of things, but especially this tag:
<div id="issue_content">CONTENT</div>
I’d like to remove everything from the html file except this tag and its content :
<div id="issue_content">CONTENT</div>
Example of file:
<p>ewrfefsd</p>
<div id="issue_content">CONTENT</div>
<p>ewrfefsd</p>
</html>
After deleting, the contents of the file should look like this:
<div id="issue_content">CONTENT</div>
I try to use regular expression:
(<div id=\"issue_content\">)(.*?)(<\/div>)(.*?)
,
but this regular expression remove only tag <div id="issue_content">CONTENT</div>
and content of this tag
Upvotes: 0
Views: 1254
Reputation: 147146
This regex should do what you want. Make sure you check the . matches newline
box on the Replace
tab, and position the cursor at the beginning of the document.
^.*?(<div[^>]*id="issue_content">.*?<\/div>).*$
Replace with \1
.
Note that this code will only work if there are no other <div>
tags nested within the one you are looking for.
Upvotes: 1
Reputation: 2317
Try this, where $str
is your HTML content variable.
preg_match('/<div id="issue_content">(.*)<\/div>/i', $str, $matches);
echo $matches[1];
Upvotes: 0
Reputation: 10929
You can change your Regex to the following: The idea is that it matches everything, but creates a Match
Group
, containing the string you want, that you can use to replace everything with your Group
:
This is the regex:
/[\s\S]*?(<div id=\"issue_content\">[^>]+>)[\s\S]+/
It matches everything at start upto the string, you want, then it creates a Group with your string, and finally matches everything after that.
When replacing, you replace with Group 1:
$1
Now you only have your string.
Upvotes: 0