Roshan Bagdiya
Roshan Bagdiya

Reputation: 2178

Number of reducer in sqoop

How many default mappers and reducers in sqoop? (4-mappers, 0-reducers).

If used --where or --query condition in sqoop import then how many reducers will be there ?

In local cluster it is showing 0 reducers after using --where or --query condition

Upvotes: 1

Views: 7411

Answers (4)

Jude
Jude

Reputation: 1

For most of the functions, sqoop is a map-only job. Even if there are aggregations in the free-form query that query would be executed at the RDBMS hence no reducers. However for one particular option "--incremental lastmodified", reducer(s) are called if "--merge-key" is specified (used for merging the new incremental data with the previously extracted data). In this case, there seems to be a way to specify the number of reducers also using the property "mapreduce.job.reduces" as below.

sqoop import -Dmapreduce.job.reduces=3 --incremental lastmodified --connect jdbc:mysql://localhost/testdb --table employee --username root --password cloudera --target-dir /user/cloudera/SqoopImport --check-column trans_dt --last-value "2019-07-05 00:00:00" --merge-key emp_id

The "-D" properties are expected before the command options.

Upvotes: 0

Tutu Kumari
Tutu Kumari

Reputation: 503

Reducers are required for aggregation. While fetching data from mysql , sqoop simply uses select queries which is done by the mappers.

There are no reducers in sqoop. Sqoop only uses mappers as it does parallel import and export. Whenever we write any query(even aggregation one such as count , sum) , these all queries run on RDBMS and the generated result is fetched by the mappers from RDBMS using select queries and it is loaded on hadoop parallely. Hence the where clause or any aggregation query runs on RDBMS , hence no reducers required.

Upvotes: 1

Dev
Dev

Reputation: 13753

Sqoop jobs are map only. There is no reducer phase.

For example, sqoop import from Mysql to HDFS with 4 mappers will generate 4 concurrent connections and start fetching data. 4 Mappers job are created. Data will be written to the HDFS part files. There is no reducer stage.

Upvotes: 1

Vijay_Shinde
Vijay_Shinde

Reputation: 1352

As per sqoop user guide, Sqoop imports data in parallel from most database sources. You can specify the number of map tasks (parallel processes) to use to perform the import by using the --num-mappers

 argument. By default, four tasks are used. As if we are not doing any aggregation task the reducer task will be zero. For more details http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports

Upvotes: 2

Related Questions