Reputation: 115
I have set-up a two(2) node Cassandra cluster and trying to perform queries using shark. But it takes around 10 minutes for a query. But the query works fine. (I used Cloudera to install the software for me)
Time taken: 421.189 seconds
shark>
I tried to tune the shark by changing some parameters(increasing the: SPARK_MEM and SHARK_MASTER_MEM) in the /opt/shark/shark/conf/shark-env.sh. But was no luck.
Much appreciated if anyone can give me any clue for this slowness?
Here is the list of versions which I have installed for various software involved:
Cassandra: 2.0.8
Shark: shark-0.9.1-bin-cdh4.6.0-fe75a886
Spark: SPARK-0.9.0-1.cdh4.6.0.p0.98
Hadoop: 2.0.0-cdh4.7.0
Hardware Spec:
RAM: 256GB
CPU: 2x Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (Total 20 cores with HT)
Upvotes: 0
Views: 159
Reputation: 41
Sorry, can't comment. This isn't an answer, but some thoughts about the problem. I also have encountered similar problem, but while testing local setup with one cassandra node. The simplest request to 10-row table
cqlsh:db> SELECT * FROM table;
takes less than a second in CQL shell.
But in shark it takes about 10 seconds.
shark> USE db; SELECT * FROM table;
...
Time taken: 11.274 seconds
There is bin/shark-withinfo
executable in shark dir, that gives some information for the request. Maybe it will shed some light to your case. In my case it says that huge amount of tasks is made to process my request. So I'm guessing that job schleduer eats most of the time, but I'm not quite shure
...
14/07/09 17:35:19 INFO scheduler.TaskSetManager: Starting task 0.0:255 as TID 255 on executor localhost: localhost (PROCESS_LOCAL)
14/07/09 17:35:19 INFO scheduler.TaskSetManager: Serialized task 0.0:255 as 5456 bytes in 0 ms
14/07/09 17:35:19 INFO executor.Executor: Running task ID 255
14/07/09 17:35:19 INFO scheduler.TaskSetManager: Finished TID 254 in 30 ms on localhost (progress: 255/257)
14/07/09 17:35:19 INFO scheduler.DAGScheduler: Completed ResultTask(0, 254)
14/07/09 17:35:19 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/09 17:35:19 INFO rdd.HadoopRDD: Input split: localhost 9160 org.apache.cassandra.dht.Murmur3Partitioner
14/07/09 17:35:19 INFO cql.HiveCqlInputFormat: Validators : null
14/07/09 17:35:19 INFO exec.FileSinkOperator: Initializing Self 260 FS
14/07/09 17:35:19 INFO exec.FileSinkOperator: Operator 260 FS initialized
14/07/09 17:35:19 INFO exec.FileSinkOperator: Initialization Done 260 FS
14/07/09 17:35:19 INFO exec.FileSinkOperator: Final Path: FS file:...
14/07/09 17:35:19 INFO exec.FileSinkOperator: Writing to temp file: ...
14/07/09 17:35:19 INFO exec.FileSinkOperator: New Final Path: ...
14/07/09 17:35:19 INFO executor.Executor: Serialized size of result for 255 is 563
14/07/09 17:35:19 INFO executor.Executor: Sending result for 255 directly to driver
14/07/09 17:35:19 INFO executor.Executor: Finished task ID 255
...
Upvotes: 0