Ben Paton
Ben Paton

Reputation: 1442

Using PHP to scrape image url from twitter page

I'm trying to scrape an image url from twitter e.g. 'https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large' using php. I have found the following php code and file_get_contents is working but I don't think the regurlar expression is matching the url. Can you help debug this code? Thanks in advance.

Here is a snippet from twitter which contains the image:

<div class="media-gallery-image-wrapper">
     <img class="large media-slideshow-image" alt="" src="https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg:large" height="480" width="358">
 </div>

Here is the php code:

<?php
    $url = 'http://t.co/s54fJgrzrG';
    $twitter_page = file_get_contents($url);
    preg_match('/(http:\/\/p.twimg.com\/[^:]+):/i', $twitter_page, $matches);
    $imgURL = array_pop($matches); 
    echo $imgURL;
?>

Upvotes: 2

Views: 1317

Answers (2)

AbsoluteƵER&#216;
AbsoluteƵER&#216;

Reputation: 7880

Something like this should provide a URL.

<?php
    $url = 'http://t.co/s54fJgrzrG';
    $twitter_page = file_get_contents($url);
    preg_match_all('!http[s]?:\/\/pbs\.twimg\.com\/[^:]+\.(jpg|png|gif)!i', $twitter_page,$matches);
    echo $img_url=$matches[0][0];
?>

Response is

https://pbs.twimg.com/media/BGZHCHwCEAACJ19.jpg

Upvotes: 1

Chris Forrence
Chris Forrence

Reputation: 10104

It appears that your regular expression is missing part of the beginning of the URI. It was missing the 'pbs' part, and was not able to determine if http or https.

preg_match('/((http|https):\/\/pbs.twimg.com\/[^:]+):/i', $twitter_page, $matches);

Upvotes: 1

Related Questions