Strip specific HTML tags using Notepad++

Question

I'd like to hear if anyone can help to to replace my large XML file's HTML markup.

The XML file has my own schema and it's all fine. But I need to remove ,

Justin Morgan · Accepted Answer

Quoting from an answer I posted yesterday:

I've heard some very good things about Beautiful Soup, HTML Purifier, and the HTML Agility Pack, which use Python, PHP, and .NET, respectively. Trust me--save yourself some pain and use those instead.

I strongly advise you not to use regex for this. No sane regex is going to work, or probably even come close to working. However, a decent XML parser can do this fairly easily. I'm not sure what programming languages you have access to, but if you can use PHP, .NET or another programming language, you can use the above parsers to find each span, style, div, and p and remove attributes or the entire tags.

jQuery has some good functionality for DOM-manipulation like you're describing, and you can use it to generate HTML which you then cut and paste.

If you absolutely must use regex, you could try this:

Pattern: <\s*/?\s*(span|style|div)\b[^>]*?>
Replacement: (nothing)
Pattern: <\s*p\b[^>]*?>
Replacement:

Strip specific HTML tags using Notepad++

Answers (1)

Related Questions