alexbever
alexbever

Reputation: 11

How to remove everything except html tag and content of this HTML tag in notepad++?

I open an HTML page in Notepad++.

The html page has a lot of things, but especially this tag:

<div id="issue_content">CONTENT</div>

I’d like to remove everything from the html file except this tag and its content :

<div id="issue_content">CONTENT</div>

Example of file:

<p>ewrfefsd</p>
<div id="issue_content">CONTENT</div>
<p>ewrfefsd</p>
</html>

After deleting, the contents of the file should look like this:

<div id="issue_content">CONTENT</div>

I try to use regular expression: (<div id=\"issue_content\">)(.*?)(<\/div>)(.*?)
, but this regular expression remove only tag <div id="issue_content">CONTENT</div> and content of this tag

Upvotes: 0

Views: 1254

Answers (3)

Nick
Nick

Reputation: 147146

This regex should do what you want. Make sure you check the . matches newline box on the Replace tab, and position the cursor at the beginning of the document.

^.*?(<div[^>]*id="issue_content">.*?<\/div>).*$

Replace with \1.

Note that this code will only work if there are no other <div> tags nested within the one you are looking for.

Upvotes: 1

Daniel Williams
Daniel Williams

Reputation: 2317

Try this, where $str is your HTML content variable.

preg_match('/<div id="issue_content">(.*)<\/div>/i', $str, $matches);

echo $matches[1];

Upvotes: 0

Poul Bak
Poul Bak

Reputation: 10929

You can change your Regex to the following: The idea is that it matches everything, but creates a Match Group, containing the string you want, that you can use to replace everything with your Group:

This is the regex:

/[\s\S]*?(<div id=\"issue_content\">[^>]+>)[\s\S]+/

It matches everything at start upto the string, you want, then it creates a Group with your string, and finally matches everything after that.

When replacing, you replace with Group 1:

$1

Now you only have your string.

Upvotes: 0

Related Questions