AntonioJunior
AntonioJunior

Reputation: 959

Problems on getting a page's title with PHP

I did this function in PHP to get a page's title. I know it might look a bit messy, but that's because I'm a beginner in PHP. I have used preg_match("/<title>(.+)<\/title>/i",$returned_content,$m) inside the if before and it hasn't worked as I expected.

function get_page_title($url) {
    $returned_content = get_url_contents($url);
    $returned_content = str_replace("\n", "", $returned_content);
    $returned_content = str_replace("\r", "", $returned_content);
    $lower_rc = strtolower($returned_content);
    $pos1 = strpos($lower_rc, "<title>") + strlen("<title>");
    $pos2 = strpos($lower_rc, "</title>");
    if ($pos2 > $pos1)
        return substr($returned_content, $pos1, $pos2-$pos1);
    else
        return $url;
}

This is what I get when I try to get the titles of the following pages using the function above: http://www.google.com -> "302 Moved" http://www.facebook.com -> ""http://www.facebook.com" http://www.revistabula.com/posts/listas/100-links-para-clicar-antes-de-morrer -> "http://www.revistabula.com/posts/listas/100-links-para-clicar-antes-de-morrer" (When I add a / to the end of the link, I can get the title successfully: "100 links para clicar antes de morrer | Revista Bula")

My questions are: - I know google is redirecting to my country's mirror when i try to access google.com, but how can I get the title of the page it redirects to? - What is wrong in my function that makes it get the title of some pages, but not of others?

Upvotes: 2

Views: 288

Answers (2)

user783437
user783437

Reputation: 217

Why not try something like this?? Works very well.

function get_page_title($url) 
{
        $source = file_get_contents($url);

        $results = preg_match("/<title>(.*)<\/title>/", $source, $title_matches);
        if (!$results) 
            return null; 

        //get the first match, this is the title 
        $title = $title_matches[1];
        return $title;
}

Upvotes: 0

Brad
Brad

Reputation: 163270

HTTP clients should follow redirects. That 302 status code means that the content you tried to get isn't at that location, and the client should follow the Location: header to figure out where it is.

You have two problems here. The first is not following redirects. If you use cURL, you can get it to follow redirects by setting this:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);

See this question for a full solution:

Make curl follow redirects?

The second problem is that you are parsing HTML with RegEx. Don't do that. See this question for better alternatives:

How do you parse and process HTML/XML in PHP?

Upvotes: 5

Related Questions