slashie
slashie

Reputation: 137

Upload a csv file to GCP storage , time out error

I used the following code to upload file to GCP storage. I received timeout error. The file is around 1Gb. It is relative a large file. How to solve this upload time out issue?

File "/Users/xxxx/opt/anaconda3/lib/python3.8/site-packages/google/resumable_media/requests/_request_helpers.py", line 136, in http_request return _helpers.wait_and_retry(func, RequestsMixin._get_status_code, retry_strategy) File "/Users/xxxx/opt/anaconda3/lib/python3.8/site-packages/google/resumable_media/_helpers.py", line 186, in wait_and_retry raise error File "/Users/xxx/opt/anaconda3/lib/python3.8/site-packages/google/resumable_media/_helpers.py", line 175, in wait_and_retry response = func() File "/Users/xxxx/opt/anaconda3/lib/python3.8/site-packages/google/auth/transport/requests.py", line 482, in request response = super(AuthorizedSession, self).request( File "/Users/xxxxx/opt/anaconda3/lib/python3.8/site-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/Users/xxxxx/opt/anaconda3/lib/python3.8/site-packages/requests/sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "/Users/xxxxxx/opt/anaconda3/lib/python3.8/site-packages/requests/adapters.py", line 498, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out'))


def upload_file_to_gcs(local_filepath:str, bucket_name:str, gcs_filepath:str = None):
   
    if local_filepath == None: 
        raise ValueError("local_filepath cannot be None")

    if not os.path.isfile(local_filepath) or not os.path.exists(local_filepath):
        raise TypeError(f"{local_filepath} is not a file or does not exist.")

    if bucket_name == None: 
        raise ValueError("bucket cannot be None")

    if not bucket_exist(bucket_name):
        logging.info(f"Bucket {bucket_name} does not exist. Creating...")
        create_bucket(bucket_name)

    logging.info(f"Uploading {local_filepath} to GCS...")

    # Initialise a client
    storage_client = storage.Client()

    if gcs_filepath == None: 
        gcs_filepath = Path(local_filepath).name

    #create bucket object
    bucket = storage_client.get_bucket(bucket_name)

    #upload
    blob = bucket.blob(gcs_filepath)
    uploaded_file =  blob.upload_from_filename(local_filepath)

    logging.info(f"Uploaded {local_filepath} to {bucket_name} in GCS.")
    
    return vars(blob)



Upvotes: 0

Views: 2823

Answers (1)

CloudBalancing
CloudBalancing

Reputation: 1676

You can define a timeout when creating the bucket client. Take a look at - https://googleapis.dev/python/storage/latest/retry_timeout.html

If you have bad internet connection you can also play around with the chunk size of the uploads (although it is not recommended)

from google.cloud import storage
storage.blob._DEFAULT_CHUNKSIZE = 5 * 1024* 1024  # 5 MB
storage.blob._MAX_MULTIPART_SIZE = 5 * 1024* 1024  # 5 MB

Upvotes: 2

Related Questions