Reputation: 23
I'm trying to extract particular string from the whole HTML source code.
HTML Source: view-source:https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en
Need To Extract String: https://instagram.fmaa1-2.fna.fbcdn.net/t51.2885-15/e35/18645014_163619900839441_7821159798480568320_n.jpg
From the "og:image" Meta Property.
i have tried some methods, but everything gone wrong. Is there any way to grab the image link from the og:image meta property of the source code. After extracting need to store the image url on a particular variable. Expert helps needed. Url that need to extract
Upvotes: 0
Views: 1285
Reputation: 41
Try this code to scrap webpage. I used simple_html_dom_parser. you can download it from https://sourceforge.net/projects/simplehtmldom/files/
include_once("simple_html_dom.php");
$output_filename = "example_homepage.html";
$fp = fopen($output_filename, 'w');
$url = 'https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt ($curl, CURLOPT_FILE, $fp);
$result = curl_exec($curl);
curl_close($curl);
fclose($fp);
$html = file_get_html('example_homepage.html');
foreach($html->find('meta[property=og:image]') as $element)
echo $element->content . '<br>';
Upvotes: 0
Reputation: 47934
Don't use preg_match_all()
if you are only grabbing one substring. Loading a DOMDocument
seems like overkill for this task.
By using \K
you can reduce result array bloat.
Sample Input:
$input='<meta property="og:title" content="Instagram post by Narendiran blah blah" />
<meta property="og:image" content="https://instagram.fmma1-2.blah.jpg" />
<meta property="og:description" content="8 Likes, 1 Comments - blah" />';
Method (Demo):
$url=preg_match('/"og:image"[^"]+"\K[^"]+/',$input,$out)?$out[0]:null;
echo $url;
Output:
https://instagram.fmma1-2.blah.jpg
The regex engine will run more efficiently by using a negated character class. [^"]
. (Pattern Demo)
Upvotes: 1
Reputation: 881
In this code snippet I'm using DOMDocument to scrap the attribute content form the meta tag. It stores it in an Array in case there are more and returns it. Hope it works.
function get_img_url($url) {
// Create a new DOM object
$html = new DOMDocument();
// load the HTML page
$html->loadHTMLFile($url);
// create a empty array object
$imageArray = array();
//Loop through each meta tag
foreach($html->getElementsByTagName('meta') as $meta) {
$imageArray[] = array('url' => $meta->getAttribute('content'));
}
//Return the list
return $imageArray;
}
Upvotes: 0