Incorrect regex for divs

Question

I'm trying to get the divs from many of my website files using regexes, but I'm failing
This is the thing I'm trying to do http://regexr.com/38to9

I need the following div with class data and more, with classes plainText and extData to actually be fitting the regex, everything inside. There's no extra divs inside the ones I listed.
I'm sitting on this for around 2 hours now and I can't figure it out.
It's the following for anyone who doesn't want to go visit that cool site


    Something



     Text in here

With regex

\s*\s*(...)\s*<\/div>

The first div is highlighted, the second one isn't. Nor do I get any results with preg_match_all with php. Does it have anything to do with the fact I'm using tabs in the second div and I'm not using them in the first one?
(Wrote it quickly on the website to see if it works)

zx81 · Accepted Answer

You have a great non-regex answer, but you should also know that you were really close...

With all disclaimers about parsing html with regex, adding the DOTALL modifier (?s) to your original expression matches what you want:

(?s)\s*(.*?)\s*<\/div>

See demo.

How does this work?

The DOTALL modifier (?s) tells the engine that a dot can match a newline character. This is important for your (.*?) because the content of the divs can span several lines.

Incorrect regex for divs

Answers (2)

Related Questions