Reputation: 8035
I need to scrape a web page that has a javascript array embeded in inline javascript code, such as:
<script>
var videos = new Array();
videos[0] = 'http://myvideos.com/video1.mov';
videos[1] = ....
....
</script>
What's the easiest way to approach this and end up with a PHP array of these video urls?
Edit: All videos are .mov extension.
Upvotes: 2
Views: 1370
Reputation: 65342
This is a bit more complicated, but it will get only those links, that are really of the form videos[0] = 'http://myvideos.com/video1.mov';
$tmp=str_replace(array("\r","\n"),'',$original,$matches);
$pattern='/\<script\>\s+var\ videos.*?((\s*videos\[\d+\]\ \=\ .http\:\/\/.*?\;\s*?)+)(.*?)\<\/script\>/';
$a=preg_match_all($pattern,$tmp,$matches);
unset($tmp);
if (!$a) die("no matches");
$pattern="/videos\[\d+\]\ \=\ /";
$matches=preg_split($pattern,$matches[1][0]);
$final=array();
while(sizeof($matches)>0) {
$match=trim(array_shift($matches));
if ($match=='') continue;
$final[]=substr($match,1,-2);
}
unset($matches);
print_r($final);
After feedback from the OP here is the simplified version:
$original=file_get_contents($url);
$pattern='/http\:\/\/.*?\.mov/';
$a=preg_match_all($pattern,$original,$matches);
if (!$a) die("no matches");
print_r($matches[0]);
Upvotes: 1
Reputation: 2425
You can scrape this by reading the page with a file_get_contents then retrieve the urls with a regex. This is the simplest way i know, especially if you know the file extensions for your videos. Exemple:
<?php
$file = file_get_contents('http://google.com');
$pattern = '/http:\/\/([a-zA-Z0-9\-\.]+\.[fr|com]+)/i';
preg_match_all($pattern, $file, $matches);
var_dump($matches);
Upvotes: 1