tbw875
tbw875

Reputation: 379

How to loop over nextPageToken using GoogleDrive's Python Quickstart

My goal is to have a list of all of the items & folders in everyone's Google Drive. I'm starting with trying to make sure the script works on my own. I have read cover-to-cover the Drive REST API documentation, and eventually found this code, which can also be found here.

from __future__ import print_function
import httplib2
import os
import sys

from apiclient import discovery
from oauth2client import client
from oauth2client import tools
from oauth2client.file import Storage

reload(sys)
sys.setdefaultencoding('utf-8')

try:
    import argparse
    flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
    flags = None

# If modifying these scopes, delete your previously saved credentials
# at ~/.credentials/drive-python-quickstart.json
SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Drive API Python Quickstart'


def get_credentials():
    """Gets valid user credentials from storage.

    If nothing has been stored, or if the stored credentials are invalid,
    the OAuth2 flow is completed to obtain the new credentials.

    Returns:
        Credentials, the obtained credential.
    """
    home_dir = os.path.expanduser('~')
    credential_dir = os.path.join(home_dir, '.credentials')
    if not os.path.exists(credential_dir):
        os.makedirs(credential_dir)
    credential_path = os.path.join(credential_dir,
                                   'drive-python-quickstart.json')

    store = Storage(credential_path)
    credentials = store.get()
    if not credentials or credentials.invalid:
        flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
        flow.user_agent = APPLICATION_NAME
        if flags:
            credentials = tools.run_flow(flow, store, flags)
        else: # Needed only for compatibility with Python 2.6
            credentials = tools.run(flow, store)
        print('Storing credentials to ' + credential_path)
    return credentials

def main():
    """Shows basic usage of the Google Drive API.

    Creates a Google Drive API service object and outputs the names and IDs
    for up to 10 files.
    """
    credentials = get_credentials()
    http = credentials.authorize(httplib2.Http())
    service = discovery.build('drive', 'v3', http=http)

    results = service.files().list(
        pageSize=1000,fields="nextPageToken, files(mimeType, name)").execute()
    items = results.get('files', [])
    if not items:
        print('No files found.')
    else:
        print('Files:')
        for item in items:
            print('{0} ({1})'.format(item['name'], item['mimeType']))

if __name__ == '__main__':
    main()

My problem is with the nextPageToken, and how to properly use it. The max PageSize is 1000, so I must loop over the nextPageToken, fetch it from the resulting JSON, put it back into the original loop (line 66?), to get another 1000 results. How do I do this?

Upvotes: 8

Views: 14348

Answers (4)

laoc81
laoc81

Reputation: 51

You must looping while token (for the nextpage) is not null, like this code at the end:

(do not forget to install -->

pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

Copy and paste this code (Do not forget to change your paths and your googleDrive folder ID, at the end)

from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.http import MediaFileUpload, MediaIoBaseDownload
            
# If modifying these scopes, delete the file token.pickle.
SCOPES = [
'https://www.googleapis.com/auth/spreadsheets', 
"https://www.googleapis.com/auth/drive.file", 
"https://www.googleapis.com/auth/drive"
]
             

# FOR AUTHENTICATION
def authenticate():
    creds = None
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'YOUR PATH FOR THE CREDENTIALS JSON/credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        
        with open('YOUR PATH /token.pickle', 'wb') as token:
            pickle.dump(creds, token)
    
    service = build('drive', 'v3', credentials=creds)
    return service

    
# LISTS TO TAKE ALL FILES AND IDs FROM SPECIFIC FOLDER
listFilesDrive=[]
line = []
        
# TO TAKE ALL FILES FROM SPECIFIC FOLDER
def listFilesFromGoogleFolder(IDFolder):
    service = authenticate()
    # Call the Drive v3 API
    results = service.files().list( q="'{}' in parents".format(FolderIDFromGDrive),
        fields="nextPageToken, files(id, name)").execute()
    items = results.get('files', [])
    # TAKE TOKEN FROM THE NEXT PAGE (IF THERE IS NO ONE, THIS VALUE WILL BE NULL)
    token = results.get('nextPageToken', None)
    #print('token->>',nextPageToken)

    if not items:
        print('No files found.')
    else:
        print('Files:')
        line = []
        for item in items:
            # TAKE FIRST PAGE IN A LIST ->> "listFilesDrive"
            arquivo = item['name']
            IDarquivo = item['id']
            line.append(arquivo)
            line.append(IDarquivo)
            listFilesDrive.append(line)
            line=[]
            print(u'{0} ({1})'.format(item['name'], item['id']))

    # LOOPING WHILE TOKEN FOR OTHER PAGES IS NOT NULL TOKEN 
    while token!=None:
        service = authenticate()
        results = service.files().list( q="'{}' in parents".format(IDFolder),
            pageToken=token,
            fields="nextPageToken, files(id, name)").execute()
        items = results.get('files', [])
        # TAKE A NEW TOKEN FOR THE NEXT PAGE, IF THERE IS NO, THIS TOKEN WILL BE NULL ("None")
        token = results.get('nextPageToken', None)

        if not items:
            print('No files found.')
        else:
            print('Files:')
            linha = []
            for item in items:
                arquivo = item['name']
                IDarquivo = item['id']
                line.append(arquivo)
                line.append(IDarquivo)
                listFilesDrive.append(line)
                line=[]
                print(u'{0} ({1})'.format(item['name'], item['id']))

    print(len(listFilesDrive))
    print(listFilesDrive)
        
        
        
# put your specific information
if __name__ == '__main__':
    FolderIDFromGDrive='YOUR FOLDER ID'
    listFilesFromGoogleFolder(FolderIDFromGDrive)

Upvotes: 2

parapet
parapet

Reputation: 69

I had quite a bit of trouble with this. I didn't read the example closely enough to notice that nextPageToken & newStartPageToken were not the same thing.

I split up the functions a little and added a loop. Basically, return the startPageToken and loop over the same function / call the function as required.

from __future__ import print_function
import httplib2
import os

#julian
import time

from apiclient import discovery
from oauth2client import client
from oauth2client import tools
from oauth2client.file import Storage

try:
    import argparse
    flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
    flags = None

# If modifying these scopes, delete your previously saved credentials
# at ~/.credentials/drive-python-quickstart.json
SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Drive API Python Quickstart'


def get_credentials():
    """Gets valid user credentials from storage.

    If nothing has been stored, or if the stored credentials are invalid,
    the OAuth2 flow is completed to obtain the new credentials.

    Returns:
        Credentials, the obtained credential.
    """
    home_dir = os.path.expanduser('~')
    credential_dir = os.path.join(home_dir, '.credentials')
    if not os.path.exists(credential_dir):
        os.makedirs(credential_dir)
    credential_path = os.path.join(credential_dir,'drive-python-quickstart.json')

    store = Storage(credential_path)
    credentials = store.get()
    if not credentials or credentials.invalid:
        flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
        flow.user_agent = APPLICATION_NAME
        if flags:
            credentials = tools.run_flow(flow, store, flags)
        else: # Needed only for compatibility with Python 2.6
            credentials = tools.run(flow, store)
        print('Storing credentials to ' + credential_path)
    return credentials

def main():
    """Shows basic usage of the Google Drive API.

    Creates a Google Drive API service object and outputs the names and IDs
    for up to 10 files.
    """
    credentials = get_credentials()
    http = credentials.authorize(httplib2.Http())
    service = discovery.build('drive', 'v3', http=http)

    saved_start_page_token = StartPage_v3(service)
    saved_start_page_token = DetectChanges_v3(service, saved_start_page_token)

    starttime=time.time()
    while True:
        saved_start_page_token = DetectChanges_v3(service, saved_start_page_token)
        time.sleep(10.0 - ((time.time() - starttime) % 10.0))

def StartPage_v3(service):
    response = service.changes().getStartPageToken().execute()
    print('Start token: %s' % response.get('startPageToken'))
    return response.get('startPageToken')

def DetectChanges_v3(service, saved_start_page_token):
    # Begin with our last saved start token for this user or the
    # current token from getStartPageToken()
    page_token = saved_start_page_token;
    while page_token is not None:
        response = service.changes().list(pageToken=page_token, spaces='drive').execute()
        for change in response.get('changes'):
            # Process change
            mimeType = change.get('file').get('mimeType')
            print( 'Change found for: %s' % change)
        if 'newStartPageToken' in response:
            # Last page, save this token for the next polling interval
            saved_start_page_token = response.get('newStartPageToken')
        page_token = response.get('nextPageToken')
    return saved_start_page_token

if __name__ == '__main__':
    main()

Upvotes: 0

Neyoh
Neyoh

Reputation: 633

Let's look the google drive api documentation for the File:list Method

In the fields of your request you are asking the nextPageToken, the result will contain the token for the nextPage (if the nextPage exists). The result will be something like this :

{
 ...,
 "nextPageToken": "V1*3|0|XXXXXX",
 "files": [
  {
   ...
  },...
  ]
}

you can extract nextPageToken value like :

token = results.get('nextPageToken', None)

The List method can take the string parameter pageToken :

The token for continuing a previous list request on the next page. This should be set to the value of 'nextPageToken' from the previous response.

Just set the parameter pageToken in the next request to get the next page of results :

    results = service.files().list(
        pageSize=1000,
        pageToken=token,
        fields="nextPageToken, files(mimeType, name)").execute()
    items = results.get('files', [])

Now you can easily make a loop to get all result.

Upvotes: 5

ReyAnthonyRenacia
ReyAnthonyRenacia

Reputation: 17651

I will try to demonstrate the concept for you but you'll do the implementation in Python. The short answer is, nextPageToken. nextPageTokens enable you to retrieve the results from the next page.

enter image description here

When you perform a GET request, a nextPageToken will always be included in the response so if you had 1000 results but you only wanted to display 20 per page, you can fetch the remaining 980 files using nextPageToken.

Run this URL and you'll see something like:

"kind": "drive#fileList",
 "nextPageToken": "V1*3|0|CjkxOHY2aDdROE9JYkJGWUJEaU5Ybm1OVURSemJTcWFMa2lRQlVJSnVxYmI2YkYzMmhnVHozeWkwRnASBxCqqcG4kis",
 "incompleteSearch": false,

The value of the nextPageToken here is what you use to get to the next page. When you get to the next page and you have more results, a new nextPageToken will be generated for you until you view/get all the results (980-1000).

Upvotes: 1

Related Questions