Andrew C.
Andrew C.

Reputation: 410

What does Sqoop 2 provide that Sqoop 1 does not?

According to sqoop.apache.org, Sqoop 2 is not feature complete and should not be used for production systems. Fair enough, some people may want to test out Sqoop 2's new features on their test environments.

Cloudera has a feature comparison between Sqoop 1 and Sqoop 2 (https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_ig_sqoop_vs_sqoop2.html), but according to the page there is nothing that Sqoop 2 provides that Sqoop 1 does not also provide.

So why would anyone use Sqoop 2 in its current form? Does it provide any advantages over Sqoop 1? If not, why is it available for use? Thanks in advance!

Upvotes: 4

Views: 5230

Answers (3)

Mehdi LAMRANI
Mehdi LAMRANI

Reputation: 11597

Just as a quick note :

According to Cloudera (as of Nov 2017)

Note: Sqoop 2 is being deprecated. Cloudera recommends using Sqoop 1.

Upvotes: 10

Ani Menon
Ani Menon

Reputation: 28247

Some of the features expected in the Sqoop2 stable release:

  1. An easy to use GUI which would be additional to the existing command line.
  2. Security fixes like openly shared passwords to be fixed
  3. Easier debugging with better logging.
  4. Providing support to connectors which don't follow JDBC model.

Currently there are no stable releases of sqoop 2 available. But you may build the latest project to test the product and commit to the open project (if interested).


Refer:

Sqoop2 proposal

Features and releases

Upvotes: 4

Aditya Agarwal
Aditya Agarwal

Reputation: 753

Apache Sqoop uses a client model where the user needs to the install Sqoop along with connectors/drivers on the client. Sqoop2 uses a service based model, where the connectors/drivers are installed on the Sqoop2 server. Also, all the configurations needs to be done on the Sqoop2 server.

From an MR perspective another difference is that Sqoop submits a Map only job, while Sqoop2 submits a MapReduce job where the Mappers would be transporting the data from the source, while the Reducers would be transforming the data according to the source specified. This provides a clean abstraction. In Sqoop, both the transportation and the transformations were provided by Mappers only.

Another major difference in Sqoop2 is from a security perspective. The administrator would be setting up the connections to the source and the targets, while the operator user uses the already established connections, so the operator user need not know the details about the connections. And operators will be given access to only some of the connectors as required.

Upvotes: 4

Related Questions