Ali
Ali

Reputation: 1058

Extract all MP3 and OGG Links from String with preg_match_all

i was trying to create a regular expressions to extract all MP3/OGG links from a example word but i could't! this is a example word that i'm trying to extract MP3/OGG files from it:

this is a example word http://domain.com/sample.mp3 and second file is https://www.mydomain.com/sample2.ogg. then this is a link for third file <a href="http://seconddomain.com/files/music.mp3" target="_blank">Download</a>

and PHP part:

$Word = "this is a example word http://domain.com/sample.mp3 and second file is https://www.mydomain.com/sample2.ogg. then this is a link for third file <a href="http://seconddomain.com/files/music.mp3" target="_blank">Download</a>";


$Pattern = '/href=\"(.*?)\".mp3/';
preg_match_all($Pattern,$Word,$Matches);
print_r($Matches);

i tried this too:

$Pattern = '/href="([^"]\.mp3|ogg)"/';
$Pattern = '/([-a-z0-9_\/:.]+\.(mp3|ogg))/i';

so i need your help to fix this code and extract all MP3/OGG links from that example word.

Thank you guys.

Upvotes: 1

Views: 1400

Answers (2)

Denis Kuzmin
Denis Kuzmin

Reputation: 800

..extract all MP3/OGG links from that example word.

e.g.:

(?<=https?://(.+)?)\.(mp3|ogg)
  • $1 - uri
  • $2 - extension

Updated:

:( yes, on the PHP (v5.5 tested) search with:

(?<=https?://(.+)?)\.(mp3|ogg)

there are restrictions:

  • Compilation failed: lookbehind assertion is not fixed length at offset n

so, the similar variant:

  • (?<=p1(.+)?)p2 - match p2 if matched p1 before
  • p2(?=(.+)p3) - match p2 if matched p3 after - all working with not fixed length ~ .+? for PHP

for your sample:

//p2(?=.*p3)
preg_match_all("#https?://(?=(.+?)\.(mp3|ogg))#im", $Word, $Matches);

/*
[0] => Array
    (
        [0] => http://
        [1] => https://
        [2] => http://
    )

[1] => Array
    (
        [0] => domain.com/sample
        [1] => www.mydomain.com/sample2
        [2] => seconddomain.com/files/music
    )

[2] => Array
    (
        [0] => mp3
        [1] => ogg
        [2] => mp3
    )
 */

Upvotes: 1

MrLore
MrLore

Reputation: 3780

To retrieve all links, you can use:

((https?:\/\/)?(\w+?\.)+?(\w+?\/)+\w+?.(mp3|ogg))

Demo.

((https?:\/\/)? Optional http:// or https://

(\w+?\.)+? Matches domain groups

(\w+?\/)+ Matches the final domain group and forward slash

\w+?.(mp3|ogg)) Matches a filename ending in .mp3 or .ogg.

In the string you provided there are several unescaped quotation marks, when corrected and my regex added in:

$Word = "this is a example word http://domain.com/sample.mp3 and second file is https://www.mydomain.com/sample2.ogg. then this is a link for third file <a href=\"http://seconddomain.com/files/music.mp3\" target=\"_blank\">Download</a>";

$Pattern = '/((https?:\/\/)?(\w+?\.)+?(\w+?\/)+\w+?.(mp3|ogg))/im';
preg_match_all($Pattern,$Word,$Matches);
var_dump($Matches[0]);

Produces the following output:

array (size=3)
  0 => string 'http://domain.com/sample.mp3' (length=28)
  1 => string 'https://www.mydomain.com/sample2.ogg' (length=36)
  2 => string 'http://seconddomain.com/files/music.mp3' (length=39)

Upvotes: 1

Related Questions