How to extract hrefs from HTML with PHP

Question

Assume I have a valid htmlfile which I save into a string. Now I want to extract the links of the anchor elements (hrefs). Therefore I want to use pure regular expressions.

preg_match_all('/]*href="(.+)">/', $html, $match);

Usually I want to receive a string like that:

http://www.thisIsAHrefLinkIWantToHave.de

But instead I receive also the following string, logical caused by (.+) in the regex:

index?a=f">Link   Link 2   Link 3   Link 4   Link 5   Link 6   PHP String Manipulation: Extract hrefs) But I'd like to have solution without those/any libraries, just with regex. What I have to do to solve the matter of my regex?

I thought about from first " to next " . But how to create that pattern or another pattern, which solves the problem?

[EDIT:] Solution

preg_match_all('/]*href="([A-Za-z0-9\/?=:&_.]+)?"/', $html, $match);

user2848613 · Accepted Answer

Musa is correct in that the period (.) is greedy. try [A-Za-z0-9_]+ instead of .+

How to extract hrefs from HTML with PHP

Answers (2)

Related Questions