Using PHP to scrape image url from twitter page

I'm trying to scrape an image url from twitter e.g. 'https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large' using php. I have found the following php code and file_get_contents is working but I don't think the regurlar expression is matching the url. Can you help debug this code? Thanks in advance.

Here is a snippet from twitter which contains the image:

<div class="media-gallery-image-wrapper">
     <img class="large media-slideshow-image" alt="" src="https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large" height="480" width="358">
 </div>

Here is the php code:

<?php
    $url = 'http://t.co/s54fJgrzrG';
    $twitter_page = file_get_contents($url);
    preg_match('/(http:\/\/p.twimg.com\/[^:]+):/i', $twitter_page, $matches);
    $imgURL = array_pop($matches); 
    echo $imgURL;
?>

Upvotes: 2

Answers (2)

AbsoluteƵERØ

Reputation: 7880

Something like this should provide a URL.

<?php
    $url = 'http://t.co/s54fJgrzrG';
    $twitter_page = file_get_contents($url);
    preg_match_all('!http[s]?:\/\/pbs\.twimg\.com\/[^:]+\.(jpg|png|gif)!i', $twitter_page,$matches);
    echo $img_url=$matches[0][0];
?>

Response is

https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg

Upvotes: 1

Chris Forrence

Reputation: 10104

It appears that your regular expression is missing part of the beginning of the URI. It was missing the 'pbs' part, and was not able to determine if http or https.

preg_match('/((http|https):\/\/pbs.twimg.com\/[^:]+):/i', $twitter_page, $matches);

Upvotes: 1

Using PHP to scrape image url from twitter page

Answers (2)

Related Questions