Reputation: 199
i want to extract specific links from a website.
The links look like that:
<a href="1494761,offer-mercedes-used.html">
The links are always the same - except the brandname (mercedes in this case).
This works fine so far but only delivers the first part of the link:
preg_match_all('/((\d{7}),offer-)/s',$inhalt,$results);
And this delivers the first link with the whole website :(
preg_match_all('/((\d{7}).*html)/s',$inhalt,$results);
Any ideas?
Note that i use preg_match_all() and not preg_match().
Thanks, Chama
Upvotes: 1
Views: 1228
Reputation: 517
Trying to parse xml/html with regex generally isn't a good idea, but if you're sure it will always be formatted well, this should return any links in the content.
/<a href="([^">]+)">/
This will more closely match only the example pattern you gave, but not sure what variations you might have
/<a href="([0-9]{7},offer-[a-z]+-used\.html)">/
// [7 numbers],offer-[at least one letter]-used.html
Upvotes: 1
Reputation: 145512
While .*?
would do (= less greedy), in both cases you should specify a more precise pattern.
Here [\w.-]+
would do. But [^">]+
might also be feasible, if the HTML source is consistent (or you specifically wish to ignore other variations).
preg_match_all('/((\d{7}),offer-[\w.-])/s',$inhalt,$results);
Upvotes: 1