How to migrate On Prem Hadoop to GCP

Question

I am trying to migrate our organization's hadoop jobs to GCP...I am confused between GCP Data Flow and Data Proc...

I want to re-use Hadoop jobs we already have created and minimize the management of the cluster as much as possible. We also want to be able to persist data beyond the life of the cluster...

Can anyone suggest

skjagini · Accepted Answer

I would just start with DataProc as it is very close to what you have.

Check out DataProc initialization actions, https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/init-actions, create a simple cluster and get a feel for it.

DataFlow is completely managed and you don't operate any cluster resources, but at the same time you cannot migrate an onsite cluster to DataFlow as is, you need to migrate (some times rewrite) your Hive/Pig/Oozie etc.

Cost for DataFlow is also calculated differently, though there is no upfront cost vs DataProc, everytime you run a job you incur some cost associated with it on DataFlow.

How to migrate On Prem Hadoop to GCP

Answers (2)

Related Questions