Reputation: 483
I'm trying to download a large folder with 50000 images from my GDrive into a local server using Python. The following code receives a limitation error. Any alternative solutions?
import gdown
url = 'https://drive.google.com/drive/folders/135hTTURfjn43fo4f?usp=sharing' # I'm showing a fake token
gdown.download_folder(url)
Failed to retrieve folder contents:
The gdrive folder with url: https://drive.google.com/drive/folders/135hTTURfjn43fo4f?usp=sharing has at least 50 files, gdrive can't download more than this limit, if you are ok with this, please run again with --remaining-ok flag.
Upvotes: 8
Views: 13833
Reputation: 41
The download limit is set in ../gdown/download_folder.py
If you installed gdown
in a virtual environment, simply edit the download_folder.py
file located in .venv/lib/python3.*/site-packages/gdown/
. Edit the line MAX_NUMBER_FILES = 50
and set the value to your new maximum.
Upvotes: 4
Reputation: 45
I was trying to download CORDv0 from google drive via CLI, and there is no other good way for one line downloading. The best way is to save the folder as zip archive to your disk and then download as the unified file.
In some cases, the idea of changing download limit can help. In colab, I used:
!pip uninstall gdown --yes
!cd .. && git clone https://github.com/wkentaro/gdown
with open('../gdown/gdown/download_folder.py', 'r') as f:
code = f.read().replace('MAX_NUMBER_FILES = 50', 'MAX_NUMBER_FILES = 10000')
with open('../gdown/gdown/download_folder.py', 'w') as f:
f.write(code)
!cd ../gdown && pip install -e . --no-cache-dir
!pip show gdown
But please remember about gdown errors. As it was aforementioned, gdown lib is not the best choice.
Upvotes: 1
Reputation: 29
This is a workaround that I used to download urls using gdown
import re
import os
urls = <copied_urls>
url_list = urls.split(', ')
pat = re.compile('https://drive.google.com/file/d/(.*)/view\?usp=sharing')
for url in url_list:
g = re.match(pat,url)
id = g.group(1)
down_url = f'https://drive.google.com/uc?id={id}'
os.system(f'gdown {down_url}')
Note: This solution isn't ideal for 50000 images as the copied urls string will be too huge. If your string is huge, copy it in a file and process it instead of using a variable. In my case I had to copy 75 large files
Upvotes: 2
Reputation: 93
!pip uninstall --yes gdown # After running this line, restart Colab runtime.
!pip install gdown -U --no-cache-dir
import gdown
url = r'https://drive.google.com/drive/folders/1sWD6urkwyZo8ZyZBJoJw40eKK0jDNEni'
gdown.download_folder(url)
Upvotes: -3
Reputation: 11194
As what kite has mentioned in the comments, use it with the remaining_ok
flag.
gdown.download_folder(url, remaining_ok=True)
This wasn't mentioned in https://pypi.org/project/gdown/ so there might be any confusion.
Any references on remaining_ok
isn't available aside from the warning and this github code.
Seems like gdown
is strictly limited to 50 files and haven't found a way of circumventing it.
If other than gdown
is an option, then see code below.
import io
import os
import os.path
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
credential_json = {
### Create a service account and use its the json content here ###
### https://cloud.google.com/docs/authentication/getting-started#creating_a_service_account
### credentials.json looks like this:
"type": "service_account",
"project_id": "*********",
"private_key_id": "*********",
"private_key": "-----BEGIN PRIVATE KEY-----\n*********\n-----END PRIVATE KEY-----\n",
"client_email": "service-account@*********.iam.gserviceaccount.com",
"client_id": "*********",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account%40*********.iam.gserviceaccount.com"
}
credentials = service_account.Credentials.from_service_account_info(credential_json)
drive_service = build('drive', 'v3', credentials=credentials)
folderId = '### Google Drive Folder ID ###'
outputFolder = 'output'
# Create folder if not existing
if not os.path.isdir(outputFolder):
os.mkdir(outputFolder)
items = []
pageToken = ""
while pageToken is not None:
response = drive_service.files().list(q="'" + folderId + "' in parents", pageSize=1000, pageToken=pageToken,
fields="nextPageToken, files(id, name)").execute()
items.extend(response.get('files', []))
pageToken = response.get('nextPageToken')
for file in items:
file_id = file['id']
file_name = file['name']
request = drive_service.files().get_media(fileId=file_id)
### Saves all files under outputFolder
fh = io.FileIO(outputFolder + '/' + file_name, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(f'{file_name} downloaded completely.')
Upvotes: 2