user1592380
user1592380

Reputation: 36247

How to use use php preg_split with an html string

I am trying to parse a badly formed html table:

A couple of lines of this are:

  Food:</b> Yes<b><br>
  Pool: </b>Beach<b></b><b><br>
  Centre:</b> Yes<b><br>

After spending a lot of time on this with Xpath, I think it is probably better to split the above text into lines use preg_split and parse from there.

The pattern I think would work uses:

<\b><\br>*: <\b>

my code is as follows:

$pattern='</b></br>*:</b>';           
$pattern=preg_quote($pattern,'#');
$chars = preg_split($pattern, $output);
print_r($chars);

I am getting the following error:

Delimiter must not be alphanumeric or backslash

What I am doing wrong?

Upvotes: 0

Views: 1285

Answers (2)

Cal
Cal

Reputation: 7157

Try this:

$pattern='</b></br>*:</b>';           
$pattern=preg_quote($pattern,'#');
$chars = preg_split('#'.$pattern.'#', $output);
print_r($chars);

The preg_quote function just makes it safely escaped, it doesn't actually add the delimiters for you.

As other people will surely point out, using regular expressions is not a good way to parse HTML :)

Your regular expression is also not going to match what you hope. Here's a version that will probably work for your input:

$in = " Pool: </b>Beach<b></b><b><br>";
$out = explode(':', strip_tags($in));
$key = trim($out[0]);
$value = trim($out[1]);
echo "$key = $value\n";

This removes all the HTML, then splits on the colon, and then removes any surrounding whitespace.

Upvotes: 1

KRyan
KRyan

Reputation: 7598

Your pattern needs to start and end with a delimiter; looks like you're using # if I'm reading this correctly, so you should have $pattern = '#</b></br>.*:</b>#';.

Also, you're mixing things up; * is not a simple wildcard in regex. If you mean "any number of any characters," the pattern you need is .*. I've included this above.

Upvotes: 0

Related Questions