RegEx in HTML split with preg-match

Question

I have a corrupt html-page which i unfortunately can't parse with xml/xcode so i came up with regex. I'm a regexbeginner but I cant get the right result.

Source

FIELD: VALUE

I want to get the value and this is where I'm stuck

$regex = '{]*(.*?)}';

edit: as a result I want an array where I can reach the value, so I'm just interested in the value

I'm thankfull for every hint.

cheers endo

Justin Morgan · Accepted Answer

There are some immediately visible problems with your regex; for example, ]* doesn't do what you think it does. But rather than suggest a different regex, let me urge you to do the sanest thing:

Don't use regex for this!

Trust me. Don't do it. Others will come in here and suggest new regex patterns, and their patterns will all be wrong. Regex isn't even up to the task of parsing clean HTML/XML, so trying to use it on arbitrarily corrupted code is a recipe for madness. Try HTML Tidy, which is made for this sort of thing. Depending on what's wrong with the HTML, a parser like HtmlPurifier or Beautiful Soup might also be able to work with it.

It may seem like a little more effort, but you'll save yourself time in the long run.

RegEx in HTML split with preg-match

Answers (2)

Don't use regex for this!

Related Questions