sudheer
sudheer

Reputation: 337

PIG and HIVE connectivity to Datastax Cassandra running huge no of maps

I am using DSE3.2.4 I have created three tables which have 10M rows in one and 50k rows in other and other with just 10 rows When I run a simple PIG or Hive query over these tables it is running same no.of mappers for both the tables.

In Pig by default pig.splitCombination is true where in it is running only one map If I set this to false it is now running 513 maps.

In Hive by default it is running 513 maps

I tried in setting the following properties

mapred.min.split.size=134217728 in `mapred-site.xml` now running 513 maps for all 

set pig.splitCombination=false in pig shell now running only 1 for all the tables

But no luck

finally I find mapred.map.tasks = 513 in job.xml

I tried to change this in mapred-site.xml but it is not reflecting

please help me in this

Upvotes: 0

Views: 63

Answers (1)

alexliu68
alexliu68

Reputation: 310

The mapper is managed by split size, so don't config it through hadoop settings, try pass &split_size= to your pig url. set "cassandra.input.split.size" for hive

default is 64M

If your Cassandra uses v-node, it creates many splits, so if you data is not big enough, then turn off v-node for hadoop nodes

Upvotes: 1

Related Questions