mgraph
mgraph

Reputation: 15338

php regex get postal address from html dom

am trying to get postal code (91150) from this html :

<div>

<strong>Adresse de la commune : </strong><br>
HOTEL DE VILLE<br>91150&nbsp;ABBEVILLE-LA-RIVIERE&nbsp;
<p>Téléphone : <strong>01 64 95 67 37</strong><br>
Fax : <strong>01 69 58 80 17</strong></p>


<p>Localisation géographique : </p>
</div>

in php i did:

$page = file_get_contents($url);
preg_match('`<strong>Adresse de la commune : </strong>([^[0-9]]*)<p>`', $page, $regs);
var_dump($regs);// returns empty

can someone help thanks,

Upvotes: 0

Views: 233

Answers (5)

Shiplu Mokaddim
Shiplu Mokaddim

Reputation: 57650

It's quite certain that your postal code more consequtive digits than phone and fax number. Using this idea you can extract it

preg_match('#Adresse de la commune\D+(\d{3,})#s', $page, $regs);

Upvotes: 1

user1344280
user1344280

Reputation:

With this one:

(?<![0-9])[0-9]{5}(?![0-9])

You can match any group of 5 numbers. You can then add more restrictions based on your input string. If there's always a non-breaking space afterwards you could use:

(?<![0-9])[0-9]{5}(?:&nbsp;)

And as many other restrictions as you need to make your regex more accurate for your input. I used .NET regex syntax, I hope that's not an inconvenience.

Upvotes: 0

Joni
Joni

Reputation: 111239

Assuming the post code is always written as a word of 5 consecutive digits, the code below can extract it:

$matches = array();
preg_match("/\b(\d{5})\b/", $page, $matches);
echo $matches[1]; // 91150

The \b-anchors force the post code to be a word of its own. This way 5 digits in a 6-digit phone number wont match, for example.

Upvotes: 0

Manuel
Manuel

Reputation: 10303

Dump it like this:

$postalcode = preg_match('`<strong>Adresse de la commune : </strong>([^[0-9]]*)<p>`', $page, &$regs);
var_dump($postalcode);

Upvotes: 0

Tchoupi
Tchoupi

Reputation: 14681

I simplified it a bit. Would this work for you?

preg_match('/[^0-9]([0-9]{5})[^0-9]/', $page, $regs);

Upvotes: 0

Related Questions