Scrape web page and retrieve javascript variables

Question

I need to scrape a web page that has a javascript array embeded in inline javascript code, such as:

What's the easiest way to approach this and end up with a PHP array of these video urls?

Edit: All videos are .mov extension.

Eugen Rieck · Accepted Answer

This is a bit more complicated, but it will get only those links, that are really of the form videos[0] = 'http://myvideos.com/video1.mov';

$tmp=str_replace(array("\r","\n"),'',$original,$matches);
$pattern='/\\s+var\ videos.*?((\s*videos$$\d+$$\ \=\ .http\://.*?\;\s*?)+)(.*?)\/';
$a=preg_match_all($pattern,$tmp,$matches);
unset($tmp);

if (!$a) die("no matches");

$pattern="/videos$$\d+$$\ \=\ /";
$matches=preg_split($pattern,$matches[1][0]);

$final=array();
while(sizeof($matches)>0) {
  $match=trim(array_shift($matches));
  if ($match=='') continue;
  $final[]=substr($match,1,-2);
}
unset($matches);

print_r($final);

After feedback from the OP here is the simplified version:

$original=file_get_contents($url);
$pattern='/http\://.*?\.mov/';
$a=preg_match_all($pattern,$original,$matches);
if (!$a) die("no matches");
print_r($matches[0]);

Scrape web page and retrieve javascript variables

Answers (2)

Related Questions