Reputation: 643
Every week I have to download a file from ftp server to GCS bucket and then import that file from GCS bucket to BigQuery. I started implementing this dataflow job in GCP Cloud Composer.
I broke dataflow into three tasks
/home/airflow/gcs/data
).I am facing issues on downloading file from ftp server to Cloud Composer data folder. Approximate size of file is 20 Gb. I used wget
command to download the file, the exact command is wget -c "remote_file_path" -P "/home/airflow/gcs/data"
. Task is starting fine, but its failing after 30 mins and file size in data folder is reflecting as zero bytes. I checked logs and didn't find any errors.
I tried the same procedure with other file of size 1Gb, it worked like charm.
I also tried using SFTPOperation
, after one hour of running I got error saying Key-exchange timed out waiting for key negotiation
.
Please help me in figuring this out. I also open to other solutions to implement this dataflow.
Thank you.
Upvotes: 0
Views: 1175
Reputation: 21
Updating the Cloud Composer environment solved the issue for us.
We've encountered similar issues with files larger than approx. 1GB. Tasks are failing after 30 minutes and file size of 0 bytes in the /data folder.
We were using Cloud Composer version 1.12.3. The release notes (https://cloud.google.com/composer/docs/release-notes of version 1.12.5 mention;
Improved GCSfuse stability to resolve intermittent issues where the mounted directory was unavailable
So we've updated the Cloud Composer instance to version 1.13.0 and it seems to fix the problem.
Upvotes: 1