Silko
Silko

Reputation: 602

Google Drive API - get document outline

In google documents you can see and navigate thorugh document outline. I'm trying to access this outline through Google Drive API, but I can't find documentation for that. This is my code for now:

    //authenticate
    $this->authenticate();

    $Service = new Google_Service_Drive($this->Client);
    $File = $Service->files->get($FileID);

    return $File;

I get document object back, but I can't find any function that returns outline. I need outline links to access specific part of the document from my application. Any ideas how can this be achived?

Upvotes: 1

Views: 536

Answers (2)

Silko
Silko

Reputation: 602

I finally solved this problem with DaImTo pointing me in the right direction. After getting a file resource I used it to get export link for the HTML code of my document and then I used that link to retrieve HTML content of that document with Google_Http_Request. (Google documentation for this part)

public function retrive_file_outline($FileID) {
    //authenticate
    $this->authenticate();

    $Service = new Google_Service_Drive($this->Client);
    $File = $Service->files->get($FileID);

    $DownloadUrl = $File->getExportLinks()["text/html"];

    if ($DownloadUrl) {
        $Request = new Google_Http_Request($DownloadUrl, 'GET', null, null);
        $HttpRequest = $Service->getClient()->getAuth()->authenticatedRequest($Request);
        if ($HttpRequest->getResponseHttpCode() == 200) {
            return array($File, $HttpRequest->getResponseBody());
        } else {
            // An error occurred.
            return null;
        }
    } else {
        // The file doesn't have any content stored on Drive.
        return null;
    }
}

After that I parsed the HTML content using DOMDocument. All the headers have id attributes which are used as an anchor link. I retrieved that id for all the headers (h1 to h6) and concatenate it with my document edit url. That gave me all my outline links. Here is the parsing and concatenating part:

public function test($FileID) {
    $File = $this->model_google->retrive_file_outline($FileID);

    $DOM = new DOMDocument;
    $DOM->loadHTML($File[1]);

    $TagNames = ["h1", "h2", "h3", "h4", "h5", "h6"];
    foreach($TagNames as $TagName) {
        $Items = $DOM->getElementsByTagName($TagName);
        foreach($Items as $Item) {
            $ID = $Item->attributes->getNamedItem("id");
            echo "<a target='_blank' href='" . $File[0]->alternateLink ."#heading=". $ID->nodeValue . "'>" . $Item->nodeValue . "</a><br />";
        }
    }
    //echo $File;
}

EDIT: I merged functions retrieve_file_outline and test into retrieve_file_outline and I got the function that returns array of document headings with links and ids:

public function retrive_file_outline($FileID) {
    //authenticate
    $this->authenticate();

    $Service = new Google_Service_Drive($this->Client);
    $File = $Service->files->get($FileID);

    $DownloadUrl = $File->getExportLinks()["text/html"];

    if ($DownloadUrl) {
        $Request = new Google_Http_Request($DownloadUrl, 'GET', null, null);
        $HttpRequest = $Service->getClient()->getAuth()->authenticatedRequest($Request);
        if ($HttpRequest->getResponseHttpCode() == 200) {
            $DOM = new DOMDocument;
            $DOM->loadHTML($HttpRequest->getResponseBody());

            $TagNames = ["h1", "h2", "h3", "h4", "h5", "h6"];
            $Headings = array();
            foreach($TagNames as $TagName) {
                $Items = $DOM->getElementsByTagName($TagName);
                foreach($Items as $Item) {
                    $ID = $Item->attributes->getNamedItem("id");
                    $Heading = array(
                        "link" => $File->alternateLink . "#heading=" . $ID->nodeValue,
                        "heading_id" => $ID->nodeValue,
                        "title" => $Item->nodeValue
                    );

                    array_push($Headings, $Heading);
                }
            }

            return $Headings;
        } else {
            // An error occurred.
            return null;
        }
    } else {
        // The file doesn't have any content stored on Drive.
        return null;
    }
}

Upvotes: 1

Linda Lawton - DaImTo
Linda Lawton - DaImTo

Reputation: 117016

File.get returns a file resource all a file resource is just the metadata for a file. its the information about the file stored on google drive.

You are going to need to load it in some document application to find any outline links. The metadata doesn't contain anything about the data stored with in the file.

Upvotes: 1

Related Questions