Jeyaganesh
Jeyaganesh

Reputation: 1354

How to fetch rss feed url of a website using php?

I need to find the rss feed url of a website programmatically.

[Either using php or jquery]

Upvotes: 4

Views: 18932

Answers (4)

Jonathan
Jonathan

Reputation: 321

A slightly smaller function that will grab the first available feed, whether it is rss or atom (most blogs have two options - this grabs the first preference).

public function getFeedUrl($url){
        if(@file_get_contents($url)){
            preg_match_all('/<link\srel\=\"alternate\"\stype\=\"application\/(?:rss|atom)\+xml\"\stitle\=\".*href\=\"(.*)\"\s\/\>/', file_get_contents($url), $matches);
            return $matches[1][0];
        }
        return false;
    }

Upvotes: 1

hakre
hakre

Reputation: 197623

The general process has already been answered (Quentin, DOOManiac), so some code (Demo):

<?php

$location = 'http://hakre.wordpress.com/';
$html = file_get_contents($location);
echo getRSSLocation($html, $location); # http://hakre.wordpress.com/feed/

/**
 * @link http://keithdevens.com/weblog/archive/2002/Jun/03/RSSAuto-DiscoveryPHP
 */
function getRSSLocation($html, $location){
    if(!$html or !$location){
        return false;
    }else{
        #search through the HTML, save all <link> tags
        # and store each link's attributes in an associative array
        preg_match_all('/<link\s+(.*?)\s*\/?>/si', $html, $matches);
        $links = $matches[1];
        $final_links = array();
        $link_count = count($links);
        for($n=0; $n<$link_count; $n++){
            $attributes = preg_split('/\s+/s', $links[$n]);
            foreach($attributes as $attribute){
                $att = preg_split('/\s*=\s*/s', $attribute, 2);
                if(isset($att[1])){
                    $att[1] = preg_replace('/([\'"]?)(.*)\1/', '$2', $att[1]);
                    $final_link[strtolower($att[0])] = $att[1];
                }
            }
            $final_links[$n] = $final_link;
        }
        #now figure out which one points to the RSS file
        for($n=0; $n<$link_count; $n++){
            if(strtolower($final_links[$n]['rel']) == 'alternate'){
                if(strtolower($final_links[$n]['type']) == 'application/rss+xml'){
                    $href = $final_links[$n]['href'];
                }
                if(!$href and strtolower($final_links[$n]['type']) == 'text/xml'){
                    #kludge to make the first version of this still work
                    $href = $final_links[$n]['href'];
                }
                if($href){
                    if(strstr($href, "http://") !== false){ #if it's absolute
                        $full_url = $href;
                    }else{ #otherwise, 'absolutize' it
                        $url_parts = parse_url($location);
                        #only made it work for http:// links. Any problem with this?
                        $full_url = "http://$url_parts[host]";
                        if(isset($url_parts['port'])){
                            $full_url .= ":$url_parts[port]";
                        }
                        if($href{0} != '/'){ #it's a relative link on the domain
                            $full_url .= dirname($url_parts['path']);
                            if(substr($full_url, -1) != '/'){
                                #if the last character isn't a '/', add it
                                $full_url .= '/';
                            }
                        }
                        $full_url .= $href;
                    }
                    return $full_url;
                }
            }
        }
        return false;
    }
}

See: RSS auto-discovery with PHP (archived copy).

Upvotes: 12

DOOManiac
DOOManiac

Reputation: 6284

This is something a lot more involved than just pasting some code here. But I can point you in the right direction for what you need to do.

  1. First you need to fetch the page
  2. Parse the string you get back looking for the RSS Autodiscovery Meta tag. You can either map the whole document out as XML and use DOM traversal, but I would just use a regular expression.
  3. Extract the href portion of the tag and you now have the URL to the RSS feed.

Upvotes: 4

Quentin
Quentin

Reputation: 943163

The rules for making RSS discoverable are fairly well documented. You just need to parse the HTML and look for the elements described.

Upvotes: 1

Related Questions