Rytis Tamošiūnas
Rytis Tamošiūnas

Reputation: 19

Get URL from <a> tag with php

Hi. I have a string that looks like this:

<a href="https://website.com/c4ca4238a0b923820dcc509a6f75849b/2020/11/55650-vaikospinta-54vnt-lape.pdf" target="_blank">55650-vaikospinta-54vnt-lape.pdf</a>

I'm trying to pull URL out with PHP, I want result like this:

https://website.com/c4ca4238a0b923820dcc509a6f75849b/2020/11/55650-vaikospinta-54vnt-lape.pdf

Things I've tried:

  1. From another StackOverflow question, I tried this:
$a = new SimpleXMLElement($FileURL);
$file = 'SimpleXMLElement.txt';
file_put_contents($file, $a);

But result I get is just the string in between and , this:

55650-vaikospinta-54vnt-lape.pdf

  1. Also from another StackOverflow question, I tried using preg_match, like this:
$file = 'preg_match.txt';
preg_match_all('/<a[^>]+href=([\'"])(?<href>.+?)\1[^>]*>/i', $FileURL, $result);

if (!empty($result)) {
    # Found a link.
    file_put_contents($file, $result);
}

I have no idea how regex works (assuming that's regex), but the result I get is just...:

ArrayArrayArrayArray

Thanks for any help!

Upvotes: 1

Views: 777

Answers (3)

TomiL
TomiL

Reputation: 697

If you insist using regular expression, ie. regex, this works:

<?php

$your_var = '<a href="https://website.com/c4ca4238a0b923820dcc509a6f75849b/2020/11/55650-vaikospinta-54vnt-lape.pdf" target="_blank">55650-vaikospinta-54vnt-lape.pdf</a>';
preg_match('/<a[^>]+href=([\'"])(?<href>.+?)\1[^>]*>/i', $your_var, $result);
$url = $result[2];

echo "Your URL: $url";

For example, you can validate your regex online: https://regex101.com/

Upvotes: 1

user1597430
user1597430

Reputation: 1146

XPath way:

$href = (string) simplexml_load_string($html)->xpath('//a/@href')[0]->href;

Upvotes: 0

Devsi Odedra
Devsi Odedra

Reputation: 5322

You can use DOMDocument with loadHtml and getElementsByTagName as below

$str = '<a href="https://website.com/c4ca4238a0b923820dcc509a6f75849b/2020/11/55650-vaikospinta-54vnt-lape.pdf" target="_blank">55650-vaikospinta-54vnt-lape.pdf</a>
';
$doc = new DOMDocument();
$d=$doc->loadHtml($str);

$a = $doc->getElementsByTagName('a');
foreach ($a as $vals) { 
    $href = $vals->getAttribute('href');
   print_r($href); PHP_EOL; 
} 

if you dont want to use foreach then u can use as $href = $a[0]->getAttribute('href');

Result will be

https://website.com/c4ca4238a0b923820dcc509a6f75849b/2020/11/55650-vaikospinta-54vnt-lape.pdf

Upvotes: 2

Related Questions