Simon
Simon

Reputation: 101

How to preg_match this text/javascript in html loaded

When I view-source a html page, I saw this in text/javascript tag:

playlist = [{
    title: "",
    thumnail: "//example.com/folder/c9cc7f89fe5c168551bca2111d479a3e_1515576875.jpg",
    source: "https://examp.com/360/HX62.mp4?authen=exp=1517246689~acl=/82vL3DDTye4/*~hmac=977cefd9de63a29fde25c856e0fdfd2f",
    sourceLevel: [
        {
            source: "https://examp.com/360/HX62.mp4?authen=exp=1517246689~acl=/82vL3DDTye4/*~hmac=977cefd9de63a29fde25c856e0fdfd2f",
            label: '360p'
        },
        {
            source: "https://examp.com/480/HX62.mp4?authen=exp=1517246689~acl=/SuCa7NnGEhM/*~hmac=80bc89a07b1f4ed87d584a89c623e946",
            label: '480p'
        },
        {
            source: "https://examp.com/720/HX62.mp4?authen=exp=1517246689~acl=/SuCa7NnGEhM/*~hmac=80bc89a07b1f4ed87d584a89c623e946",
            label: '720p'
        },
    ],
}];

I want to get strings in source and label, then I've write this code:

$page = curl ('https://example.com/video-details.html')
preg_match ('#sourceLevel:[{source: "(.*?)",label: \'360p\'},{source: "(.*?)",label: \'480p\'},{source: "(.*?)",label: \'720\'}#', $page, $source);
$data360 = $source[1];
$data480 = $source[2];
$data720 = $source[3];
echo $data360. '<br/>' .$data480. '<br/>' .$data720. '<br/>';

I know it can be wrong in somewhere, because I'm new to PHP. I'm hoping there is someone help me to correct my code. Many thanks!

Upvotes: 1

Views: 157

Answers (1)

trincot
trincot

Reputation: 351288

You need to:

  • escape braces and square brackets in your regular expression as they have special meanings in regexes,
  • escape the single quotes in the string literal for which you chose the single quote as delimiter (which you corrected after I wrote this).
  • provide for the white space that can appear between several characters (e.g. before and after {) in your page string.

I would also suggest to match the source/labels each as separate matches, so that when there are not exactly three, you will still have them all.

Here is the suggested code:

preg_match_all('~\{\s*source\s*:\s*"(.*?)"\s*,\s*label\s*:\s*\'(.*?)\'\s*\}~', 
               $page, $sources);

$sources = array_combine($sources[2], $sources[1]);

This will provide the $sources variable as an associative array, keyed by the labels:

[
    "360p" => "https://examp.com/360/HX62.mp4?authen=exp=1517246689~acl=/82vL3DDTye4/*~hmac=977cefd9de63a29fde25c856e0fdfd2f",
    "480p" => "https://examp.com/480/HX62.mp4?authen=exp=1517246689~acl=/SuCa7NnGEhM/*~hmac=80bc89a07b1f4ed87d584a89c623e946",
    "720p" => "https://examp.com/720/HX62.mp4?authen=exp=1517246689~acl=/SuCa7NnGEhM/*~hmac=80bc89a07b1f4ed87d584a89c623e946"
]

Upvotes: 1

Related Questions