Alexander_F
Alexander_F

Reputation: 2877

Regex used in preg_match() is not finding the expected URLs

I try to find a URL in a page.

The URL looks like this:

https://pos.xxxxxxxxxx.de/xxxxxxxxxxxx/app?funnel=login_box&tid=2001004

I have hidden the domain for the sake of posting.

So there is my code:

preg_match('~(https://pos.xxxxxxxxxx.de/xxxxxxxxxx/app\?funnel=login_box&tid=\d+)~', $text, $ans);

nothing found...

I try this one

preg_match('~(https://pos.xxxxxxxxxx.de/xxxxxxxxxx/app\?funnel=login_box&tid=)~', $text, $ans);
    

try to find only the fixed part of the link...

stil nothing

so I try this one

preg_match('~(https://pos.xxxxxxxxxx.de/xxxxxxxxxx/app\?funnel=login_box)~', $text, $ans);

Now I find some links, but why I can't find the whole link?

Upvotes: 0

Views: 453

Answers (3)

Pedro Lobito
Pedro Lobito

Reputation: 99081

$html = "http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
You can surf the internet anonymously at https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi.";

preg_match_all('/\b((?P<protocol>https?|ftp):\/\/(?P<domain>[-A-Z0-9.]+)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,.;]*)?)/i', $html, $urls, PREG_PATTERN_ORDER);
$urls = $urls[1][0];

Will match:

http://www.scroogle.org

http://www.scroogle.org/

http://www.scroogle.org/index.html

http://www.scroogle.org/index.html?source=library

You can surf the internet anonymously at https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi.

To loop results you can use:

for ($i = 0; $i < count($urls[0]); $i++) {
    echo $urls[1][$i]."\n";
}

will output:

http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi

cheers, Lob

Upvotes: 0

Dsda
Dsda

Reputation: 577

preg_match("/(https://[^=]+=[^=]+=[\d]+)/i",$text,$m);

if you have ' or " in the end of link, smth like this href="https://....."

you can use this one: preg_match("/\"(https://[^\"]+)\"/i",$text,$m);

Upvotes: 0

Jacek Kaniuk
Jacek Kaniuk

Reputation: 5229

Probably in html source, & is expanded to &amp;, try:

&(amp;)?

Just reminder - . means every char, so you should escape it, but it's not important here.

Upvotes: 3

Related Questions