v3nt
v3nt

Reputation: 2912

preg_replace div (or anything) with class=removeMe

just trying to remove some elements with preg_replace but can't get it to work consistently. I would like to remove an element with matching class. Problem is the element may have an ID or several classes.

ie the element could be

<div id="me1" class="removeMe">remove me and my parent</div> 

or

<div id="me1" class="removeMe" style="display:none">remove me and my parent</div>

is it possible to do this?

any help appreciated! Dan.

Upvotes: 2

Views: 13204

Answers (3)

smottt
smottt

Reputation: 3330

With preg_replace:

preg_replace('~<div([^>]*)class="(.*?)gallery(.*?)">(.*?)</div>~im', '', $html);

Upvotes: 1

Michael
Michael

Reputation: 35341

I agree with MarcB. Overall, it's better to use a DOM when manipulating HTML. But here is a regex based on smottt's answer that might work:

$html = preg_replace('~<div([^>]*)(class\\s*=\\s*["\']removeMe["\'])([^>]*)>(.*?)</div>~i', '', $html);
  • Use [^>]* and [^<]* instead of .*. In my testing, .*? doesn't work. If a non-matching div comes before a matching div, it will match the first div, everything in between, and the last div. For example, it incorrectly matches against this entire string: <div></div><b>hello</b><div class="removeMe">bar</div>
  • Take into account the fact that you can use single quotes with HTML attributes.
  • Also remember that there can be whitespace around the equals sign.
  • You should use the "m" modifier too so that it takes line breaks into account (see this page).

I added parenthesis for clarity, but they aren't needed. Let me know if this works or not.

EDIT: Actually, nevermind, the "m" modifier won't do anything. EDIT2: Improved the regex, but it still fails if there are any newlines in the div.

Upvotes: 4

mario
mario

Reputation: 145482

While this is still doable with regular expression, it's much simpler with e.g. QueryPath:

print qp($html)->find(".removeMe")->parent()->remove()->writeHTML();

Upvotes: 2

Related Questions