Fey
Fey

Reputation: 71

preg_match_all doesn't work as expected

I want to get matches on a webpage based on following regular expression: (.*) I tested it on regexpal.com (an online regular expression test tool) and it works fine. However, when I use it in php, I can't find any matches. The statement I use in php is

preg_match_all("/<a href=\"\/title\/.*\/\">(.*)<\/a>/", $content, $matches);

I checked the $content, it's correct. So is there anything wrong from my statement? Thanks!

Upvotes: 1

Views: 1533

Answers (3)

CodeAngry
CodeAngry

Reputation: 12985

Please, please... for the love of God, don't wrap Regular Expressions that deal with URLs or HTML in /. You have to escape it all over the place. It's terrible. Look here:

preg_match_all('~<a href="/title/[^">]+/">(.*?)</a>~si', $content, $matches);
  1. Single quotes. No longer need to escape double quotes. Why would you use double quotes when you don't have expandable "{$variables}"?
  2. Wrap RegExp into any non reserved character. For URLs and HTML / is the worst choice as it drags you to escape redundancy hell.
  3. Use 'si' as flags in HTML as tags can be multiline and .+? or .*? match multi line by default. And you need case insensitivity.
  4. Avoid using .+? in attributes. You may capture entire tags. Add break characters too. See my pattern above... so you don't over do it if HTML is broken.

There's more ways to improve this but this should do it.

Hope it helps.

Upvotes: 7

Adam
Adam

Reputation: 1090

preg_match_all("/<a href\=\"\/title\/.*\/\">(.*?)<\/a>/", $content, $matches);

I would try:

preg_match_all('/<a href\=".title.*">(.*?)<\/a>/', $content, $matches);

for brevity.

Upvotes: 0

Ωmega
Ωmega

Reputation: 43673

You need to make your regex pattern lazy (non-greedy) by adding ? >>

preg_match_all("/<a href=\"\/title\/.*?\/\">(.*?)<\/a>/", $content, $matches);

Upvotes: 1

Related Questions