qutaibah
qutaibah

Reputation: 21

regular expression to match html tag with specific contents

I am trying to write a regular expression to capture this string:

<td style="white-space:nowrap;">###.##</td>

I can't even match it if include the string as it is in the regex pattern! I am using preg_match_all(), however, I am not finding the correct pattern. I am thinking that "white-space:nowrap;" is throwing off the matching in some way. Any idea? Thanks ...

Upvotes: 2

Views: 400

Answers (4)

Alan Moore
Alan Moore

Reputation: 75222

Are you aware that the regex argument to any of PHP's preg_ functions has to be double-delimited? For example:

preg_match_all(`'/foo/'`, $target, $results)

'...' are the string delimiters, /.../ are the regex delimiters, and the actual regex is foo. The regex delimiters don't have to be slashes, they just have to match; some popular choices are #...#, %...% and ~...~. They can also be balanced pairs of bracketing characters, like {...}, (...), [...], and <...>; those are much less popular, and for good reason.

If you leave out the regex delimiters, the regex-compilation phase will probably fail and the error message will probably make no sense. For example, this code:

preg_match_all('<td style="white-space:nowrap;">###.##</td>', $s, $m)

...would generate this message:

 Unknown modifier '#'

It tries to use the first pair of angle brackets as the regex delimiters, and whatever follows the > as the regex modifiers (e.g., i for case-insensitive, m for multiline). To fix that, you would add real regex delimiters, like so:

preg_match_all('%<td style="white-space:nowrap;">###\.##</td>%i', $s, $m)

The choice of delimiter is a matter of personal preference and convenience. If I had used # or /, I would have had to escape those characters in the actual regex. I escaped the . because it's a regex metacharacter. Finally, I added the i modifier to demonstrate the use of modifiers and because HTML isn't case sensitive.

Upvotes: 1

MANCHUCK
MANCHUCK

Reputation: 2472

Why not try using DOM document instead? Then you do not have to worry about having the HTML formatted properly. Using the Dom Doc collection will also improve readability and ensure fast performance since its part of the PHP Core rather then living in user space

Upvotes: 4

JAL
JAL

Reputation: 21563

Did you see any warnings? You have to escape some bits of that, namely the / before the td close tag. This seemed to work for me:

$string='cow cow cow    <td style="white-space:nowrap;">###.##</td> cat cat cat cat';
php > preg_match_all('/<td style="white-space:nowrap;">###\.##<\/td>/',$string,$result);
php > var_dump($result);
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(43) "<td style="white-space:nowrap;">###.##</td>"
  }
}

Upvotes: 1

Ricket
Ricket

Reputation: 34057

When I'm having problems with regular expressions, I like to test them in real time with one of the following websites:

Upvotes: 2

Related Questions