hanane
hanane

Reputation: 63

Some questions about google Data fusion

I am discovering the tool and I have some questions:

-what do you exactly mean by the type File in (Source, Sink), -is it also possible to send the result of the pipeline directly to a FTP server

I check the documentation, but I did not find this information

thank you

Upvotes: 1

Views: 690

Answers (1)

Sree
Sree

Reputation: 714

Short answer: File refers to the filesystem where the pipelines run. In Data Fusion context if you are using File sink the contents will be written to HDFS on Dataproc cluster.

Data Fusion has SFTP put actions that can be used to write to SFTP. Here is a simple pipeline of how to write to SFTP from GCS.

enter image description here

Step1: GCS Source to File Sink - This writes the content of GCS to HDFS on Dataproc when the pipeline is run Step 2: SFTP Put action, that takes the output of File sink and upload to SFTP.

You need to configure the output path of File the same as source path in SFTP

Upvotes: 5

Related Questions