user929287171
user929287171

Reputation: 91

Cloud Data Fusion vs Dataproc

Cloud Data Fusion offers the ability to create ETL jobs using their graphical pipeline UI representation whereas Dataproc lets us run previously created Spark/Hadoop/Hive jobs.

With my limited experience in both these services, I have found Cloud Data Fusion to be the easier of the two to use & manage. I would like to know the use cases in which creating & running jobs in Dataproc is preferred over Cloud Data Fusion.

Upvotes: 3

Views: 2617

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75715

You asked for an opinion, so your question should be closed...

Anyway, it mainly depends on what you prefer! If you are a developer, and you want to handle, manage, customize/tweak all the steps your pipeline for performance, observability or security reason, code, and Dataproc is better for you. Same reason if all your developers already know the Hadoop ecosystem.

If you prefer to focus on the data transformation/wrangling with low/no code solution, Data fusion is for you. Especially if you have a few or no skills in development (business users).

At the end, all the pipeline will run on Dataproc.

Upvotes: 3

Related Questions