Reputation: 337
I am using DSE3.2.4 I have created three tables which have 10M rows in one and 50k rows in other and other with just 10 rows When I run a simple PIG or Hive query over these tables it is running same no.of mappers for both the tables.
In Pig by default pig.splitCombination
is true
where in it is running only one map
If I set this to false it is now running 513 maps.
In Hive by default it is running 513 maps
I tried in setting the following properties
mapred.min.split.size=134217728 in `mapred-site.xml` now running 513 maps for all
set pig.splitCombination=false
in pig shell now running only 1 for all the tables
But no luck
finally I find mapred.map.tasks = 513
in job.xml
I tried to change this in mapred-site.xml
but it is not reflecting
please help me in this
Upvotes: 0
Views: 63
Reputation: 310
The mapper is managed by split size, so don't config it through hadoop settings, try pass &split_size= to your pig url. set "cassandra.input.split.size" for hive
default is 64M
If your Cassandra uses v-node, it creates many splits, so if you data is not big enough, then turn off v-node for hadoop nodes
Upvotes: 1