Reputation:

regular expression to strip attributes and values from html tags

Hi Guys I'm very new to regex, can you help me with this.

I have a string like this "<input attribute='value' >" where attribute='value' could be anything and I want to get do a preg_replace to get just <input />

How do I specify a wildcard to replace any number of any characters in a srting?

like this? preg_replace("/<input.*>/",$replacement,$string);

Many thanks

Upvotes: 0

Answers (4)

Jan Goyvaerts

Reputation: 22009

If I understand the question correctly, you have the code:

preg_replace("/<input.*>/",$replacement,$string);

and you want us to tell you what you should use for $replacement to delete what was matched by .*

You have to go about this the other way around. Use capturing groups to capture what you want to keep, and reinsert that into the replacement. E.g.:

preg_replace("/(<input).*(>)/","$1$2",$string);

Of course, you don't really need capturing groups here, as you're only reinserting literal text. Bet the above shows the technique, in case you want to do this in a situation where the tag can vary. This is a better solution:

preg_replace("/<input [^>]*>/","<input />",$string);

The negated character class is more specific than the dot. This regex will work if there are two HTML tags in the string. Your original regex won't.

Upvotes: 0

Timothy Khouri

Reputation: 31885

Some people were close... but not 100%:

This:

preg_replace("<input[^>]*>", $replacement, $string);

should be this:

preg_replace("<input[^>]*?>", $replacement, $string);

You don't want that to be a greedy match.

Upvotes: 1

Kent Fredric

Reputation: 57384

What you have:

.*

will match "any character, and as many as possible.

what you mean is

[^>]+

which translates to "any character, thats not a ">", and there must be at least one

or altertaively,

.*?

which means "any character, but only enough to make this rule work"

BUT DONT

Parsing HTML with regexps is Bad

use any of the existing html parsers, DOM librarys, anything, Just NOT NAïVE REGEX

For example:

 <foo attr=">">

Will get grabbed wrongly by regex as

'<foo attr=" ' with following text of '">'

Which will lead you to this regex:

 `<[a-zA-Z]+( [a-zA-Z]+=['"][^"']['"])*)>  etc etc

at which point you'll discover this lovely gem:

 <foo attr="'>\'\"">

and your head will explode.

( the syntax highlighter verifies my point, and incorrectly matches thinking i've ended the tag. )

Upvotes: 10

Tomalak

Reputation: 338406

preg_replace("<input[^>]*>", $replacement, $string); 
// [^>] means "any character except the greater than symbol / right tag bracket"

This is really basic stuff, you should catch up with some reading. :-)

Upvotes: 0

regular expression to strip attributes and values from html tags

Answers (4)

BUT DONT

Related Questions