Reputation: 115
What I'm interested in is a regular expression that will accept HTML input and remove all attributes inside the tag while leaving the tag intact. For example I want this...
<p class="test" id="TestParagraph">This is some test text right here.</p>
To become this...
<p>This is some test text right here.</p>
Any help would be much appreciated.
Upvotes: 0
Views: 441
Reputation: 116977
You really don't want to use regex for this. HTML is not a regular language, you cannot guarantee that your actual text won't mimic the tags and be stripped as well. Whatever expression you come up with, there will always be cases that break it.
I would suggest using the Html Agility Pack for any HTML manipulation that you need to do.
Upvotes: 5
Reputation: 43168
Apologies for not not answering the question.
You can start with this
<(\S+)[^>]+>
replace with
<$1>
Of course, this would be easy to break if the input contains scripts or CDATA sections, or all sorts of cases. But it may be close enough for your input set.
Upvotes: 1
Reputation: 5318
HTML is not a regular language and hence you will run into issue when trying to parse it with regular expressions. As Greg noted above you might want to look at an HTML parser to do this work for you.
Enjoy!
Upvotes: 2