Minato
Minato

Reputation: 462

GCS broken pipe error when trying to upload large files

I'm trying to upload a .csv.gz file to GCS after extracting it to .csv, the file size changes from 500MB to around 5GB. I'm able to extract the .csv.gz file to a temporary path and it fails when I try to upload that file to GCS. I get the following error:

[2019-11-11 13:59:58,180] {models.py:1796} ERROR - [Errno 32] Broken pipe
Traceback (most recent call last)
  File "/usr/local/lib/airflow/airflow/models.py", line 1664, in _run_raw_tas
    result = task_copy.execute(context=context
  File "/home/airflow/gcs/dags/operators/s3_to_gcs_transform_operator.py", line 220, in execut
    gcs_hook.upload(dest_gcs_bucket, dest_gcs_object, target_file, gzip=True
  File "/home/airflow/gcs/dags/hooks/gcs_hook_conn.py", line 208, in uploa
    .insert(bucket=bucket, name=object, media_body=media) 
  File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrappe
    return wrapped(*args, **kwargs
  File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/http.py", line 835, in execut
    method=str(self.method), body=self.body, headers=self.headers
  File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/http.py", line 179, in _retry_reques
    raise exceptio
  File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/http.py", line 162, in _retry_reques
    resp, content = http.request(uri, method, *args, **kwargs
  File "/opt/python3.6/lib/python3.6/site-packages/google_auth_httplib2.py", line 198, in reques
    uri, method, body=body, headers=request_headers, **kwargs
  File "/usr/local/lib/airflow/airflow/contrib/hooks/gcp_api_base_hook.py", line 155, in new_reques
    redirections, connection_type
  File "/opt/python3.6/lib/python3.6/site-packages/httplib2/__init__.py", line 1924, in reques
    cachekey
  File "/opt/python3.6/lib/python3.6/site-packages/httplib2/__init__.py", line 1595, in _reques
    conn, request_uri, method, body, header
  File "/opt/python3.6/lib/python3.6/site-packages/httplib2/__init__.py", line 1502, in _conn_reques
    conn.request(method, request_uri, body, headers
  File "/opt/python3.6/lib/python3.6/http/client.py", line 1239, in reques
    self._send_request(method, url, body, headers, encode_chunked
  File "/opt/python3.6/lib/python3.6/http/client.py", line 1285, in _send_reques
    self.endheaders(body, encode_chunked=encode_chunked
  File "/opt/python3.6/lib/python3.6/http/client.py", line 1234, in endheader
    self._send_output(message_body, encode_chunked=encode_chunked
  File "/opt/python3.6/lib/python3.6/http/client.py", line 1065, in _send_outpu
    self.send(chunk
  File "/opt/python3.6/lib/python3.6/http/client.py", line 986, in sen
    self.sock.sendall(data
  File "/opt/python3.6/lib/python3.6/ssl.py", line 975, in sendal
    v = self.send(byte_view[count:]
  File "/opt/python3.6/lib/python3.6/ssl.py", line 944, in sen
    return self._sslobj.write(data
  File "/opt/python3.6/lib/python3.6/ssl.py", line 642, in writ
    return self._sslobj.write(data
BrokenPipeError: [Errno 32] Broken pip

From what I understood, the error could be due to the following:

Your server process has received a SIGPIPE writing to a socket. This usually happens when you write to a socket fully closed on the other (client) side. This might be happening when a client program doesn't wait till all the data from the server is received and simply closes a socket (using close function).

But I have no idea whether this is the issue or how I can fix this. Can someone help?

Upvotes: 1

Views: 2232

Answers (1)

marian.vladoi
marian.vladoi

Reputation: 8074

You should try to uploads big files in chunks.

from google.cloud import storage

CHUNK_SIZE = 128 * 1024 * 1024  

client = storage.Client()
bucket = client.bucket('destination')
blob = bucket.blob('really-big-blob', chunk_size=CHUNK_SIZE)
blob.upload_from_filename('/path/to/really-big-file')

Also you can check Parallel Composite Uploads

Similar SO question link.

Upvotes: 1

Related Questions