huffmaster
huffmaster

Reputation: 115

Regular expression that removes attributes from tags

What I'm interested in is a regular expression that will accept HTML input and remove all attributes inside the tag while leaving the tag intact. For example I want this...

<p class="test" id="TestParagraph">This is some test text right here.</p>

To become this...

<p>This is some test text right here.</p>

Any help would be much appreciated.

Upvotes: 0

Views: 441

Answers (3)

womp
womp

Reputation: 116977

You really don't want to use regex for this. HTML is not a regular language, you cannot guarantee that your actual text won't mimic the tags and be stripped as well. Whatever expression you come up with, there will always be cases that break it.

I would suggest using the Html Agility Pack for any HTML manipulation that you need to do.

Upvotes: 5

harpo
harpo

Reputation: 43168

Apologies for not not answering the question.

You can start with this

<(\S+)[^>]+>

replace with

<$1>

Of course, this would be easy to break if the input contains scripts or CDATA sections, or all sorts of cases. But it may be close enough for your input set.

Upvotes: 1

Doug
Doug

Reputation: 5318

HTML is not a regular language and hence you will run into issue when trying to parse it with regular expressions. As Greg noted above you might want to look at an HTML parser to do this work for you.

Enjoy!

Upvotes: 2

Related Questions