Chama
Chama

Reputation: 199

PHP: Get specific links with preg_match_all()

i want to extract specific links from a website.

The links look like that:

<a href="1494761,offer-mercedes-used.html">

The links are always the same - except the brandname (mercedes in this case).

This works fine so far but only delivers the first part of the link:

preg_match_all('/((\d{7}),offer-)/s',$inhalt,$results);

And this delivers the first link with the whole website :(

preg_match_all('/((\d{7}).*html)/s',$inhalt,$results);

Any ideas?

Note that i use preg_match_all() and not preg_match().

Thanks, Chama

Upvotes: 1

Views: 1228

Answers (2)

Mark Ach&#233;e
Mark Ach&#233;e

Reputation: 517

Trying to parse xml/html with regex generally isn't a good idea, but if you're sure it will always be formatted well, this should return any links in the content.

/<a href="([^">]+)">/

This will more closely match only the example pattern you gave, but not sure what variations you might have

/<a href="([0-9]{7},offer-[a-z]+-used\.html)">/
// [7 numbers],offer-[at least one letter]-used.html

Upvotes: 1

mario
mario

Reputation: 145512

While .*? would do (= less greedy), in both cases you should specify a more precise pattern.

Here [\w.-]+ would do. But [^">]+ might also be feasible, if the HTML source is consistent (or you specifically wish to ignore other variations).

preg_match_all('/((\d{7}),offer-[\w.-])/s',$inhalt,$results);

Upvotes: 1

Related Questions