Narendhiran vignesh
Narendhiran vignesh

Reputation: 23

How to Extract Particular String from the HTML Source code using PHP

I'm trying to extract particular string from the whole HTML source code.

HTML Source: view-source:https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en

Need To Extract String: https://instagram.fmaa1-2.fna.fbcdn.net/t51.2885-15/e35/18645014_163619900839441_7821159798480568320_n.jpg From the "og:image" Meta Property.

i have tried some methods, but everything gone wrong. Is there any way to grab the image link from the og:image meta property of the source code. After extracting need to store the image url on a particular variable. Expert helps needed. Url that need to extract

Upvotes: 0

Views: 1285

Answers (4)

Azad Bhagat Singh
Azad Bhagat Singh

Reputation: 41

Try this code to scrap webpage. I used simple_html_dom_parser. you can download it from https://sourceforge.net/projects/simplehtmldom/files/

include_once("simple_html_dom.php");

$output_filename = "example_homepage.html";
$fp = fopen($output_filename, 'w');
$url = 'https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en';
$curl = curl_init();

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt ($curl, CURLOPT_FILE, $fp);
$result = curl_exec($curl);

curl_close($curl);
fclose($fp);

$html = file_get_html('example_homepage.html');

foreach($html->find('meta[property=og:image]') as $element) 
   echo $element->content . '<br>';

Upvotes: 0

mickmackusa
mickmackusa

Reputation: 47934

Don't use preg_match_all() if you are only grabbing one substring. Loading a DOMDocument seems like overkill for this task.

By using \K you can reduce result array bloat.

Sample Input:

$input='<meta property="og:title" content="Instagram post by Narendiran blah blah" />
<meta property="og:image" content="https://instagram.fmma1-2.blah.jpg" />
<meta property="og:description" content="8 Likes, 1 Comments - blah" />';

Method (Demo):

$url=preg_match('/"og:image"[^"]+"\K[^"]+/',$input,$out)?$out[0]:null;
echo $url;

Output:

https://instagram.fmma1-2.blah.jpg

The regex engine will run more efficiently by using a negated character class. [^"]. (Pattern Demo)

Upvotes: 1

melkawakibi
melkawakibi

Reputation: 881

In this code snippet I'm using DOMDocument to scrap the attribute content form the meta tag. It stores it in an Array in case there are more and returns it. Hope it works.

   function get_img_url($url) { 

        // Create a new DOM object 
        $html = new DOMDocument(); 

        // load the HTML page 
        $html->loadHTMLFile($url); 

        // create a empty array object 
        $imageArray = array(); 

        //Loop through each meta tag
        foreach($html->getElementsByTagName('meta') as $meta) { 
            $imageArray[] = array('url' => $meta->getAttribute('content')); 
        } 

        //Return the list 
        return $imageArray; 
    } 

Upvotes: 0

BenM
BenM

Reputation: 53208

Assuming you have the markup inside a string with PHP, what's wrong with a RegEx?

preg_match_all('/<meta.*property="og:image".*content="(.*)".*\/>/', $string, $matches);
echo $matches[1][0];

Demo

Disclaimer: more efficient regexes may be available.

Upvotes: 0

Related Questions