Reputation: 63
I am discovering the tool and I have some questions:
-what do you exactly mean by the type File in (Source, Sink), -is it also possible to send the result of the pipeline directly to a FTP server
I check the documentation, but I did not find this information
thank you
Upvotes: 1
Views: 690
Reputation: 714
Short answer: File refers to the filesystem where the pipelines run. In Data Fusion context if you are using File sink the contents will be written to HDFS on Dataproc cluster.
Data Fusion has SFTP put actions that can be used to write to SFTP. Here is a simple pipeline of how to write to SFTP from GCS.
Step1: GCS Source to File Sink - This writes the content of GCS to HDFS on Dataproc when the pipeline is run Step 2: SFTP Put action, that takes the output of File sink and upload to SFTP.
You need to configure the output path of File the same as source path in SFTP
Upvotes: 5