Reputation: 2178
How many default mappers and reducers in sqoop? (4-mappers, 0-reducers).
If used --where
or --query
condition in sqoop import
then how many reducers will be there ?
In local cluster it is showing 0
reducers after using --where or --query condition
Upvotes: 1
Views: 7411
Reputation: 1
For most of the functions, sqoop is a map-only job. Even if there are aggregations in the free-form query that query would be executed at the RDBMS hence no reducers. However for one particular option "--incremental lastmodified", reducer(s) are called if "--merge-key" is specified (used for merging the new incremental data with the previously extracted data). In this case, there seems to be a way to specify the number of reducers also using the property "mapreduce.job.reduces" as below.
sqoop import -Dmapreduce.job.reduces=3 --incremental lastmodified --connect jdbc:mysql://localhost/testdb --table employee --username root --password cloudera --target-dir /user/cloudera/SqoopImport --check-column trans_dt --last-value "2019-07-05 00:00:00" --merge-key emp_id
The "-D" properties are expected before the command options.
Upvotes: 0
Reputation: 503
Reducers are required for aggregation. While fetching data from mysql , sqoop simply uses select queries which is done by the mappers.
There are no reducers in sqoop. Sqoop only uses mappers as it does parallel import and export. Whenever we write any query(even aggregation one such as count , sum) , these all queries run on RDBMS and the generated result is fetched by the mappers from RDBMS using select queries and it is loaded on hadoop parallely. Hence the where clause or any aggregation query runs on RDBMS , hence no reducers required.
Upvotes: 1
Reputation: 13753
Sqoop jobs are map only. There is no reducer phase.
For example, sqoop import from Mysql to HDFS with 4 mappers will generate 4 concurrent connections and start fetching data. 4 Mappers job are created. Data will be written to the HDFS part files. There is no reducer stage.
Upvotes: 1
Reputation: 1352
As per sqoop user guide, Sqoop imports data in parallel from most database sources. You can specify the number of map tasks (parallel processes) to use to perform the import by using the
--num-mappers
argument. By default, four tasks are used. As if we are not doing any aggregation task the reducer task will be zero. For more details http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports
Upvotes: 2