Sri Harsha
Sri Harsha

Reputation: 643

GCP Cloud Composer - not able to download large file to data folder

Every week I have to download a file from ftp server to GCS bucket and then import that file from GCS bucket to BigQuery. I started implementing this dataflow job in GCP Cloud Composer.

I broke dataflow into three tasks

I am facing issues on downloading file from ftp server to Cloud Composer data folder. Approximate size of file is 20 Gb. I used wget command to download the file, the exact command is wget -c "remote_file_path" -P "/home/airflow/gcs/data". Task is starting fine, but its failing after 30 mins and file size in data folder is reflecting as zero bytes. I checked logs and didn't find any errors.

I tried the same procedure with other file of size 1Gb, it worked like charm.

I also tried using SFTPOperation, after one hour of running I got error saying Key-exchange timed out waiting for key negotiation.

Please help me in figuring this out. I also open to other solutions to implement this dataflow.

Thank you.

Upvotes: 0

Views: 1175

Answers (1)

Krisjan O.
Krisjan O.

Reputation: 21

Updating the Cloud Composer environment solved the issue for us.

We've encountered similar issues with files larger than approx. 1GB. Tasks are failing after 30 minutes and file size of 0 bytes in the /data folder.

We were using Cloud Composer version 1.12.3. The release notes (https://cloud.google.com/composer/docs/release-notes of version 1.12.5 mention;

Improved GCSfuse stability to resolve intermittent issues where the mounted directory was unavailable

So we've updated the Cloud Composer instance to version 1.13.0 and it seems to fix the problem.

Upvotes: 1

Related Questions