user3858193
user3858193

Reputation: 1518

Sqoop vs Informatica Big Data edition for Data sourcing

I have a option of using Sqoop or Informatica Big Data edition to source data into HDFS. The source systems are Tearadata, Oracle.

I would like to know which one is better and any reason behind the same.

Note: My current utility is able to pull data using sqoop into HDFS , Create Hive staging table and archive external table.

Informatica is the ETL tool used in the organization.

Regards Sanjeeb

Upvotes: 0

Views: 3627

Answers (4)

Shaounak Nasikkar
Shaounak Nasikkar

Reputation: 314

SQOOP must be used for the Data exchange. You have lot of options with which you can have an optimal performance. Also if you are trying to exchange the data between RDBMS(Teradata / Oracle) <-> Informatica <-> Hadoop cluster then the data would first need to be brought to the Informatica Server which may involve additional I/O.

If the data processing must be done within hive Informatica BDE must be used.

Upvotes: 0

Varun
Varun

Reputation: 16

Tool versus handcoding was always there. Informatica tool gives enterprise level solution which is easier to maintain.

BDM 10.1.1 supports sqoop with spark engine. Spark 2.0.1 is supported in this version so performance its pretty good. BDM 10.2 is just released with new features like stateful variable support which was missing in earlier versions.

Upvotes: 0

Volamr
Volamr

Reputation: 66

Although this was asked an year ago, sharing new features in Informatica

Informatica BDM version 10.1 supports Sqoop connectivity i.e. you can use Sqoop to read the data from RDBMS and load it into Hadoop/Hive

Also, there are many new features in BDM version 10.2, especially the parameterization support in the developer tool and dynamic mappings.

Upvotes: 1

akshat thakar
akshat thakar

Reputation: 1526

Sqoop

  • Sqoop is capable of performing full and incremental loading from Oracle/Teradata.
  • Sqoop does parallel copy of data from source systems.
  • Sqoop scripts can be custom genrated and scheduled by Oozie.
  • Open source solution for any size cluster. No license cost.

Informatica

  • Best Interface in ETL Industry to manage mappings.
  • Does not provide parallel copy options. Provides Hive mode for parallel processing. Basically converts transformation into Hive queries for execution. Also supports push downs to generate MR code.
  • Licensing cost per node. If you plan 500 Hadoop nodes for future data storage you need to pay 10 times as compared with 50 node cluster when you scale cluster.
  • Informatica BDE is relatively new product in market. INFA Developer will be usefull for working on Big data. There are challenges in supporting all latest Hadoop platform features on Informatica, also traditional RDBMS features like Sequence generation, Stateful mapping,Sessions, Lookup Transformation in Informatica BDE.
  • Informatica MDM does not support Hadoop.

If price is criteria for decision making, go for Sqoop. If you want to leverage flexibility of switching Hadoop plaftorm tools, use Sqoop(Sqoop project is also thinking of moving over Spark). If you are tied to Informatica for some reason, go for Informatica. But most Informatica developers want to move to Hadoop technologies.

Upvotes: 2

Related Questions