tenub
tenub

Reputation: 3446

Using RegEx to Capture All Links & In Between Text From A String

<Link to: http://www.someurl(.+)> maybe some text here(.*) <Link: www.someotherurl(.+)> maybe even more text(.*)

Given that this is all on one line, how can I match or better yet extract all full urls and text? ie. for this example I wish to extract:

http://www.someurl(.+) . maybe some text here(.*) . www.someotherurl(.+) . maybe even more text(.*)

Basically, <Link.*:.* would start each link capture and > would end it. Then all text after the first capture would be captured as well up until zero or more occurrences of the next link capture.

I have tried:

preg_match_all('/<Link.*?:.*?(https|http|www)(.+?)>(.*?)/', $v1, $m4);

but I need a way to capture the text after the closing >. The problem is that there may or may not be another link after the first one (of course there could also be no links to begin with!).

Upvotes: 0

Views: 95

Answers (2)

CrayonViolent
CrayonViolent

Reputation: 32532

$string = "<Link to: http://www.someurl(.+)> maybe some text here(.*) <Link: www.someotherurl(.+)> maybe even more text(.*)";
$string = preg_split('~<link(?: to)?:\s*([^>]+)>~i',$string,-1,PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
echo "<pre>";
print_r($string);

output:

Array
(
    [0] => http://www.someurl(.+)
    [1] =>  maybe some text here(.*) 
    [2] => www.someotherurl(.+)
    [3] =>  maybe even more text(.*)
)

Upvotes: 2

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use this pattern:

preg_match_all('~<link\b[^:]*:\s*\K(?<link>[^\s>]++)[^>]*>\s*(?<text>[^<]++)~',
               $txt, $matches, PREG_SET_ORDER);

foreach($matches as $match) {
    printf("<br/>link: %s\n<br/>text: %s", $match['link'], $match['text']);
}

Upvotes: 0

Related Questions